Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present description as detailed in the accompanying claims.
The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present description. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
User information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in this disclosure are both user-authorized or fully authorized information and data by parties, and the collection, use and processing of relevant data requires compliance with relevant laws and regulations and standards of relevant countries and regions, and is provided with corresponding operation portals for user selection of authorization or denial.
Cgroup is a kernel feature in an operating system such as Linux, and may be used to limit, record, and isolate the usage of hardware resources of a control group, where the hardware resources include, but are not limited to, CPU, memory, disk I/O (input/output), etc. With cgroup, a system administrator can take control of hardware resources, such as limiting the proportion of processors and/or memory used by a particular process, such as a CPU (Central Processing Unit ), or counting their resource usage, etc. Taking CPU resource allocation as an example, the functions provided by cgroup include limiting the total CPU usage time of all processes in the control group and binding the processes in the control group to specific CPU cores and/or memory nodes.
However, the cgroup-based processor scheduling policy is relatively fixed, which may occur when the processor resources corresponding to the partial cgroup group are free or partially busy. Therefore, the adopted cgroup processor scheduling scheme is not flexible enough, and the processor cannot be finely managed. In a production environment, although cgroup may be dynamically adjusted, often these cgroup configurations are empirical values after practice and large-scale deployment verification, and frequent changes are unlikely to be made.
Therefore, it is also conceivable to perform hardware resource control by some virtualization techniques, for example, virtualization techniques allowing a physical CPU on a physical machine to be abstracted and divided into a plurality of virtual CPUs (virtual CPUs for short). As an example, a virtualization management tool KVM (Kernel-based Virtual Machine ) in a Linux Kernel may utilize a Linux Kernel module and some user space tools to implement virtualization, allowing multiple Virtual Machines (VMs) to run on a Linux system, each capable of independently running different operating systems and applications. The KVM may assign virtual CPUs to virtual machines running on hosts, the virtual CPUs mapped and scheduled to physical CPUs. In KVM, the scheduling of virtual CPUs may be based on a variety of policies, including polling, fair scheduling (fair scheduling), and real-time scheduling, depending on the implementation of the virtualization management software. The ultimate goal is to ensure the performance of the VM while at the same time exploiting the maximum performance of the physical hardware. KVM uses a Linux kernel scheduler to mount virtual CPUs onto physical CPUs, and also supports advanced functions such as CPU intensive task priority and memory page migration.
Virtual CPUs may be dynamically migrated to different physical cores in order to more efficiently utilize physical resources. This is called virtual CPU migration (migration). For example, if the virtual CPU on one physical core is lightly loaded and the virtual CPU on the other physical core is heavier, a portion of the load may be migrated to the lightly loaded core to achieve load balancing. Dynamic scheduling also allows the number of virtual CPUs to be dynamically modified. For example, when the virtual CPU running load is high, the number of virtual CPUs may be increased to meet the VM's demand for resources. An administrator may use various monitoring tools to track resource usage and make optimization adjustments based on such data.
Based on this, the scheduling of the physical processors may be achieved through the idea of virtualization. The embodiment of the specification provides a processor scheduling scheme, the embodiment of the specification can switch an operating system of a running host machine from a root mode to a virtualization mode so as to create a virtual processor where a target process is located and a new virtual processor in the virtualization mode, and predict idle time information for the target process according to the target process bound to a physical processor, and can withdraw the virtual processor where the target process is located in an idle state from the physical processor to schedule the physical processor to run the new virtual processor. Therefore, the embodiment can make the target process bound on the physical processor give off the physical processor when idle, thereby improving the utilization rate of the physical processor, and flexibly and elastically providing the computing power of the physical processor for the upper layer through the virtual processor.
As shown in fig. 1, a method for scheduling processors of a physical machine according to an exemplary embodiment of the present disclosure, where the physical machine includes a physical processor, and the physical processor is bound with a target process, the method may include the following steps:
In step 102, the running host operating system is switched from root mode to virtualization mode, in which the virtual processor where the target process is located and the new virtual processor are created.
In step 104, idle time information is obtained, the idle time information characterizing an idle time of the target process in an idle state.
In step 106, based on the idle time information, the virtual processor where the target process in the idle state is located is taken out of running from the physical processor, and the physical processor is scheduled to run the new virtual processor.
By way of example, hardware resources of a physical machine include, but are not limited to, one or a combination of processors including a CPU, a graphics processor (Graphics Processing Unit, GPU), a data processor (Data Processing Unit, DPU), a tensor processor (Tensor Processing Unit, TPU), a cloud infrastructure processor (Cloud Infrastructure Processing Unit, CIPU), and an Application SPECIFIC INTEGRATED Circuit (ASIC), among others. The hardware resources of the physical machine may also include any type of storage medium, as well as other components such as I/O devices, communication components, displays, power components, and audio components, as the present embodiment is not limited in this regard.
The host operating system is system software running on the physical machine, and is responsible for managing hardware resources and providing a platform so that applications can run on the operating system and access and utilize the hardware resources to perform specific tasks through the operating system. As an example, the host operating system may be a Linux operating system or the like, which is not limited in this embodiment.
The processor in the embodiments of the present disclosure includes, but is not limited to, CPU, GPU, DPU or TPU, etc., and in the subsequent embodiments, for convenience of description, a CPU is mostly used as an example for description.
To achieve the purpose of providing the elastic computing power, embodiments of the present disclosure may first create a new virtual processor directly for the running host operating system, and then implement the elastic computing power by scheduling the new virtual processor to run on the physical processor.
In the embodiment, when the running host operating system is switched from the root mode to the virtualized mode, that is, when the running host operating system is switched from the root mode to the virtualized mode, namely, when the running physical CPU is switched to the virtualized processor, compared with the KVM virtualization mode, the virtual machine is lighter, and because the running physical CPU is switched to the virtual CPU, the virtualization layer can continuously schedule the virtual CPU to continue running on the physical CPU. Based on this, the present embodiment creates one or more virtual processors in the virtualization mode.
In some examples, as shown in fig. 2A, there is a diagram of a physical machine that is illustrated in the present specification according to an exemplary embodiment. The physical machine architecture of the embodiment provides two operation modes, namely a root mode and a virtualization mode, for a host machine operating system in an operation state, a virtualization layer can be inserted between a hardware resource and the host machine operating system, the virtualization layer can be located between the hardware resource of the physical machine and the host machine operating system, the host machine operating system in operation is switched from the root mode to the virtualization mode by the virtualization layer, and various functions can be flexibly realized for the host machine operating system in the operation state under the condition that the operating system does not need to be reinstalled.
As an example, the host operating system corresponds to a root mode and a virtualization mode. The root mode is a working mode that a host operating system has direct access right to hardware resources, and the virtualization mode is a working mode that a virtualization layer virtualizes the hardware resources and replaces the host operating system to schedule and access the virtualized resources. Access to hardware resources by the virtualization layer is limited in the virtualization mode. The virtualization mode may also be referred to as a non-root mode with respect to the root mode, and accordingly, the host operating system may also be referred to as a guest operating system in the virtualization mode.
In this embodiment, a lightweight, powerful and efficient virtualization layer may be interposed between the hardware resources and the host operating system. The virtualization layer may be pre-developed and interposed between the hardware resources and the host operating system. The manner of inserting the virtualization layer is not limited. For example, after the host operating system is installed on the physical machine, the virtualization layer may be inserted in a state in which the host operating system is in normal operation.
As an example, the virtualization layer may be a software module within the host operating system, where the host operating system is in root mode, the host operating system may be switched from root mode to virtualization mode by the software module, after which the host operating system becomes a guest operating system, and the management of the hardware resources of the physical machine is taken over by the virtualization layer, that is, the meaning of inserting the virtualization layer between the hardware resources and the host operating system, so that the host operating system uses the hardware resources through the virtualization layer. The switch timing of switching from the root mode to the virtualized mode may be various in practical applications, for example, when the resource utilization rate of the physical CPU is low, or a predetermined condition is triggered, or a set instruction is received, which is not limited in this embodiment.
After the host operating system switches from the root mode to the virtualized mode, taking the CPU as an example, the existing physical machine CPU can be switched to the corresponding virtual CPU, and for the physical CPU of the binding target process in this embodiment, that is, the virtual CPU where the target process is located, a new virtual CPU is further created in the virtualized mode in this embodiment. The newly created virtual CPU may be one or more, and may be flexibly configured according to actual needs, which is not limited in this embodiment.
As an example, assuming that the physical machine includes 4 physical CPUs, after the host operating system switches from the root mode to the virtualized mode, the 4 physical CPUs may be respectively switched to corresponding virtual CPUs, n virtual CPUs may be newly created based on the 4 virtual CPUs switched, and in order to distinguish from the virtual CPUs switched by the physical CPUs, the newly created virtual CPU is referred to as a newly created virtual CPU in this embodiment. Thus, the total number of virtual CPUs is 4 switched virtual CPUs, plus n newly created virtual CPUs.
As one example, during a host operating system switching from root mode to virtualized mode, the virtualization layer may create a corresponding physical description structure for each physical CPU and a corresponding virtualized description structure for the virtual CPU, where the physical description structure of each physical CPU corresponds to the virtualized description structure. The physical description structure is used for storing the context information of the physical CPU running in the root mode at the time of mode switching, and the virtualized description structure is used for synchronizing the context information in the physical description structure at the time of mode switching. The virtualization layer manages virtual CPUs switched from the physical CPUs by virtualizing the description structure.
As an example, there may be various ways to create a new virtual processor, for example, the virtual processor switched from the physical processor corresponds to a virtual description structure, in order to create a new virtual processor, a virtual description structure may be newly created, optionally, a series of registers may be configured, and the method is specifically implemented by a virtual processor creation function of the virtualization layer, which is not limited in this embodiment. For example, the virtualization description structure of the new virtual processor is similar to the virtualization description structure corresponding to the physical CPU described above, for example, the fields of the two are the same, but specific field values are different. Each new virtual description structure represents one new virtual processor, and the corresponding new virtual processor can be managed through the virtual description structure. The specific management mode is not limited in this embodiment.
As an example, for the virtualization layer of the bottom layer of the physical machine in this embodiment, the virtual processor is an execution task that can be scheduled to execute on the physical CPU, and on the newly created virtual processor, the virtual processor can be used to execute any task, like a virtual machine of KVM, except that in this embodiment, the virtual processor is created in a manner different from KVM, and the virtual processor is managed by the virtualization layer. As an example, various user tasks, such as a common user-state process, may be executed on the virtual processor of the present embodiment, and the user process may be bound to a newly created virtual processor as needed, which is not limited in this embodiment. Therefore, in this embodiment, by scheduling the newly created virtual processor to the physical CPU to run, the elastic computing force can be provided for the upper layer, and how the virtualization layer at the bottom layer schedules the newly created virtual processor according to the busy condition of the physical CPU is transparent to the task at the upper layer, so that the transparency to the original running mode can be maintained.
As an example, a scheduler for scheduling physical processors may be implemented in the virtualization layer to schedule tasks run by the physical processors, such as those virtual CPUs described above (virtual cpu+n newly created virtual CPUs from which the physical CPUs are switched). The method of the embodiment can be applied to a scheduler of the virtualization layer, and optionally, the scheduler can be operated on each physical CPU, i.e. the method of the embodiment can be applied to each physical CPU.
As an example, the tasks running on the physical CPU may be tasks of the virtual CPU switched by the physical CPU, any of the n newly created virtual CPUs, and the operating system, and the priority may be set based on the above order. In one scheduling period, the virtualization layer may allocate the operation time periods for the three tasks based on the priority levels respectively, for example, assuming that the time period of one scheduling period is 10ms, the time periods may be allocated to the three tasks in proportion, for example, the operation time period allocated to the virtual CPU switched by the physical CPU is the largest, then the operation time period of the newly created virtual CPU and finally the operation time period of the task of the operating system.
As an example, there are multiple physical processors on the physical machine, n newly created virtual CPUs may be scheduled on any physical processor, or the range of physical processors that the newly created virtual CPU can run may be set as desired. When a newly created virtual CPU runs on a physical processor, the schedulers running on other physical processors will not schedule the newly created virtual CPU to run.
The embodiment does not limit the target process bound to the physical processor, and may be determined based on an actual application scenario, for example, the physical processor and the bound target process may be in a one-to-one relationship, and of course, other situations may occur in the actual application, such as many-to-one situations, etc., which is not limited in the embodiment.
The physical processor bound by the target process can be exclusive to the target process and cannot run other processes. For example, the target process may be a process corresponding to a service requiring high-performance operation, for example, a process corresponding to a high-performance network card, and the processes include the service requiring high-performance operation, and therefore, the processes are allocated with exclusive physical processors. Then, during the running process, the situation that the actual service is not executed may also occur, for example, the process waits for an event in polling, and needs to wait until the event arrives to execute the actual service. Thus, the waiting process is actually a process in which the process is in an idle state, and thus the process does not perform actual traffic in the process, which causes waste of physical processor resources. In order to utilize the resources of the physical processor in this process, in this embodiment, after the physical processor is switched to the virtualization mode through step 102, the physical processor is switched to the virtual processor, the target process corresponds to the switched virtual processor, by creating a new virtual processor, the virtualization layer can schedule the physical processor to operate the virtual processor (including the virtual processor corresponding to the target process and switched by the physical processor and the new virtual processor), and then through steps 104 to 106, the target process can operate the new virtual processor in the idle process, so as to implement the improvement of the utilization rate of the physical processor.
In practice, the idle time information may be predicted in a variety of ways, for example, based on historical operating conditions of the target process on the physical processor. The prediction process may be performed by the target process and notified to the virtualization layer, or may be performed by the virtualization layer.
As an example, the idle time information may be predicted after the target process is in an idle state. For example, the target process needs to execute a function for realizing an actual service and a function for polling for a certain event during the running process. If the function for implementing the actual traffic is executed, the target process is in an active state, i.e., a busy state, and not in an idle state. If polling is performed for a function waiting for an event, the target process is in an idle state. Thus, it may be determined based on this whether the target process is in an idle state during the execution of the physical processor.
As an example, it may be that the target process notifies the virtualization layer after being in an idle state, and then the virtualization layer predicts idle time information of the target process. The target process can also predict the idle time information and inform the virtualization layer after being in the idle state.
Based on the above, the embodiment can accurately utilize the busy and idle states of the process through the cooperation of the target process, and better utilize the physical CPU resource, thereby improving the utilization rate of the resource.
In practical applications, there are many ways to communicate the idle state or idle time information between the target process and the virtualization layer. For example, the virtualization layer may be notified by way of a system call, e.g., by the target process scheduling vmcall functions of the virtualization layer, by calling the functions.
In other examples, to reduce the overhead of communication between the target process and the virtualization layer, the present embodiment designs to allocate a shared storage area in the memory that can be shared between the target process and the virtualization layer, for example, the acquiring idle time information may include:
The shared storage area comprises a first subarea used for storing idle time information, wherein the idle time information is determined after the target process is in an idle state in the process of running on the physical processor and is written into the first subarea;
Idle time information is read from the first sub-area in the shared memory area.
The virtualization layer can manage hardware resources of the physical machine, and a shared storage area can be allocated in the memory by the virtualization layer. Alternatively, the shared memory region may contain a first sub-region for storing free time information, which may be written only by the target process, while the virtualization layer is read only, avoiding read-write contention. The size of the first sub-area may be set according to actual needs, which is not limited in this embodiment.
Thus, the target process can write the idle time information into the first subarea after predicting the idle time information, and the virtualization layer can read the idle time information by accessing the area. Alternatively, the virtualization layer may be indicated that the target process does not have idle time information by storing no idle time information in the first sub-area or storing a flag indicating that there is no idle time information. Based on this, in the above manner, the target process and the virtualization layer can communicate with lower overhead, so that the virtualization layer can determine the idle state and idle time information of the target process.
Alternatively, the idle time information may be characterized in a number of ways, for example, it may be an idle termination time. As an example, the target process determines to be in an idle state at the current time t1, and then determines that the idle duration is k, the idle termination time may be t1+k. Optionally, the determined idle termination time t1+k may be directly written into the first sub-area as idle time information, or may be appropriately reduced on the basis of t1+k as needed, for example, when the predicted idle duration k is larger. The configuration may be flexible as required in actual implementation, and this embodiment is not limited thereto.
In other examples, the shared memory region may further include a second sub-region for storing indication information;
The target process is used for accessing the second subarea of the shared storage area in the process of running on the physical processor, suspending the prediction of the idle time information if the indication information is read and the idle termination moment indicated by the idle time information is not reached currently, and predicting the idle time information and writing the idle time information into the first subarea if the indication information is not read.
Optionally, the shared memory area may be a continuous memory area, and the sizes of the first sub-area and the second sub-area, and the sequence of the first sub-area and the second sub-area before and after the shared memory area may be flexibly configured according to actual needs. As one example, the first 64 bits (bits) of the shared memory region may be a first sub-region and the 64 bits after the first sub-region may be a second sub-region. Of course, in practical applications, if the shared memory area is a discontinuous memory area, the first sub-area and the second sub-area are also optional, which is not limited in this embodiment.
The second sub-region is used for storing indication information representing that the virtualization layer has read idle time information, and can be read only by a target process, and the virtualization layer only writes, so that read-write competition is avoided. Therefore, the virtualization layer can transmit the information that the virtualization layer has read the idle time information to the target process, and the virtualization layer can schedule the physical processor to exit from running the target process after reading the idle time information so as to run the virtual processor for a certain period of time, and then the virtual processor can exit from running again, so that the target process can be run again. While the virtual processor is not necessarily able to run to the idle termination time indicated by the idle time information.
For example, when the current time is t1, the idle termination time predicted by the target process is t1+k, after the idle termination time is written into the first sub-area, the virtualization layer schedules the virtual processor to run for a certain period of time and then to exit the running, and then reschedules the target process to run on the physical processor, when the current time is t2, the target process finds that the current time t2 is less than t1+k, that is, the idle termination time predicted last time has not yet been reached, and the target process can not update idle time information at this time.
And the target process can predict the idle time information in the idle state and write the idle time information into the first subarea by accessing the second subarea to find that the indication information indicating that the virtualization layer has read the idle time information is not stored.
Based on this, through the above embodiment, the target process and the virtualization layer can communicate with lower overhead, so that the target process can determine whether the virtualization layer has read the idle time information.
In some examples, the scheduling the physical processor to run a virtual processor may include:
And distributing the current running time of the current scheduling for the virtual processor based on the idle time indicated by the future idle time information, and scheduling the physical processor to run the new virtual processor based on the current running time so that the running time of the new virtual processor is stopped on the physical processor when the running time of the new virtual processor reaches the current running time.
In practical application, the operation duration, that is, the time slices, allocated by the new virtual processor may be flexibly set according to needs, for example, a fixed time slice may be set for each scheduling period, different time slices may be set for different scheduling periods, time slices may also be set according to the operation condition of the physical processor, and the embodiment is not limited to this. In this embodiment, the local running time allocated to the new virtual processor is determined based on the idle time indicated by the future idle time information, and the maximum time is not more than the idle termination time, so that the new virtual processor does not affect the execution of the target process.
After the current running time is distributed, the physical processor can be scheduled to run the new virtual processor based on the current running time, so that the new virtual processor can be stopped from running on the physical processor when the running time reaches the current running time, and the service of the target process is not influenced.
In practical applications, when the virtual processor runs on the physical processor, some events may occur that need to be executed by the target process, in order for the target process to resume running in time, in some examples, after the step of exiting the target process in an idle state from the physical processor, the method may further include:
And if a preset operation event sent to the virtual processor where the target process is located is detected, operating the new virtual processor from the physical processor, and scheduling the virtual processor where the target process is located to operate on the physical processor.
Optionally, the preset operation event sent to the virtual processor where the target process is located may be configured according to needs, which is not limited in this embodiment. By way of example, there may be events such as IPIs (Inter-Processor Interrupts ) or external interrupts sent to the target process, which may be captured by the virtualization layer. As an example, the event here may be an event that the target process mentioned in the foregoing example waits for a poll.
When these events are detected, the embodiment may run the new virtual processor from the physical processor, specifically, may send an interrupt instruction, for example, a custom virtual interrupt instruction, to the running new virtual processor, so that the new virtual processor exits from running on the physical processor, and then may schedule the virtual processor where the target process is located to resume running on the physical processor, so as not to affect the service of the target process.
In some examples, the shared memory region further comprises a third sub-region for storing an emergency dispatch flag, the method further comprising:
And if the preset operation event sent to the virtual processor where the target process is located is detected, writing an emergency dispatch mark in the third subarea, wherein the target process is used for clearing information stored in the first subarea and suspending prediction of the idle time information after reading the emergency dispatch mark stored in the third subarea in the process of operating on the physical processor.
The specific implementation of the third sub-area is not limited in this embodiment, and the size and the order of the third sub-area and the other two sub-areas in the shared memory area may be flexibly set according to actual needs.
The third sub-region in this embodiment may be read-only by the target process, and write-only by the virtualization layer, so as to avoid a read-write competition. The third sub-area is used for storing the emergency dispatch message transmitted by the virtualization layer to the target process. In the above example, when the virtualization layer detects the preset running event sent to the virtual processor where the target process is located, the new virtual processor may be taken out of running from the physical processor, and the virtual processor where the target process is located is scheduled to run on the physical processor, so that the target process is to process the event next, in order not to affect the service of the target process, the target process may access the third sub-area, find that the urgent scheduling flag is stored therein, then may clear the information stored in the first sub-area, whether the stored idle time information is already completed or not, and may suspend the prediction of the idle time information, so that the virtualization layer may find that the information stored in the first sub-area is cleared, determine that the target process is not currently in an idle state, and may suspend the physical processor to schedule to run the virtual processor.
Optionally, the target process may continue to run for a certain period of time, for example, a certain threshold may be set, after the certain period of time of continuous running reaches the threshold, the target process does not need to execute the service, and is in an idle state for a certain period of time, and then the idle time information may be predicted again, so as to yield the running of the physical processor again.
In some examples, after the step of scheduling the physical processor to run the virtual processor, the method may further include:
after the new virtual processor exits from the physical processor, counting the current operation time of the new virtual processor;
And in response to detecting that the total operation time of the new virtual processor in the latest set time period reaches a set time threshold, suspending the virtual processor where the target process in an idle state is located from the physical processor.
In this embodiment, in order to avoid that the new virtual CPU runs too much on the physical processor and affects the normal service running of the target process, the virtualization layer may further count the total duration of the active yielding of the target process in a period of time, and if the yielding is too much, may suspend the active yielding for a period of time, so as to avoid service performance fluctuation caused by inaccurate prediction algorithm in idle time.
The following examples are provided to illustrate the invention. In the following embodiments, a processor is taken as an example of a CPU, it will be appreciated that in practical applications, the CPU in the present embodiment may be replaced by another type of processor, for example, a GPU, etc., which is not limited in this embodiment.
The embodiment can be applied to a virtualization layer, and can switch a running host operating system from a root mode to a virtualization mode and create at least one virtual CPU in the virtualization mode.
Since the newly created virtual CPU does not correspond to a fixed physical CPU, the physical CPU that each newly created virtual CPU can run can be set first, and then the scheduler can schedule each virtual CPU to run on the physical CPU that each virtual CPU can run
The scheduler of the virtualization layer may set a scheduling priority for each newly created virtual CPU, and the priority of the newly created virtual CPU may be set to be not higher than the priority of other virtual CPUs directly switched by the physical CPU, but higher than the priority of the background task. The time slices that the newly created virtual CPU can run on the physical CPU can also be set so that the scheduler of the virtualization layer allocates a certain time slice to the virtual CPU for use at each scheduling period.
With the above arrangement, in the scheduling logic of the client operating system, a certain process can be scheduled to run on the newly created virtual CPU just like other CPUs.
The physical processor is bound with a target process, and when the virtualization layer switches the running host operating system from a root mode to a virtualization mode, the physical processor also corresponds to the switched virtual CPU, namely the virtual CPU where the target process is located.
In consideration of the fact that certain processes needing to run in high performance in the client operating system are likely to be bound to a CPU switched from a physical CPU to run, the processes need to run in high performance when executing services, but the processes can also be in idle states, such as a poll state, without specific services, and waste of physical resources is caused.
In an actual scene, in the existing scheduling logic of the virtualization layer, a time slice is allocated to the virtual CPU to run through a scheduler, but for the physical CPU binding the target process, if the target process running on the physical CPU can be clearly known to be in an idle state at present, the time slice can be actively yielded to run for the CPU to newly create the virtual CPU.
In order to realize that the target process can actively yield the time slices, the process capable of yielding the time slices can be selected first, and in actual implementation, the specific mode of selecting the target process can be flexibly configured according to actual needs, and the embodiment is not limited to the method. The future idle time prediction function may be newly added by the target process, or alternatively, the future idle time prediction may be performed by the target process before and after the poll processing function is performed.
As one example, the interval time between each busy run period may be counted, e.g., the next future idle time may be predicted based on the last 10 non-busy periods. Alternatively, the predicted implementation may use an average, linear prediction, or intelligent prediction through AI, etc.
When the predicted current idle time slice length is obtained, the idle time information needs to be notified to the virtualization layer, and various notification modes can be adopted, for example, notification can be realized through a direct system call mode, for example, through calling vmcall functions and the like, so that the virtualization layer obtains the idle time information.
Since the scheduling period of the underlying scheduler is very short, this notification overhead, which is frequently invoked through vmcall, is very large. Based on this, this embodiment also designs a method of implementing that the target process notifies the virtual layer of the predicted idle time information by sharing the memory, and may allocate a shared memory area to each physical CPU to store the information.
By way of example, the manner in which the shared memory region communicates may be agreed upon, for example, as shown in FIG. 2B, the shared memory region may be designed as follows:
① The target process of the upper layer only writes, the bottom layer dispatcher only reads;
② The target process of the upper layer is read-only, and the bottom layer scheduler is written only;
③ And the upper layer process is read only, and the bottom layer scheduler is written only.
The sizes of the three sub-areas may be set according to the size of the stored information, which is not limited in this embodiment.
Optionally, in practical applications, the shared memory area may further be extended as needed, for example, a fourth sub-area may be further included to store allowed running time, where the information may be updated by the client operating system, and the information may be configured to be smaller than predicted idle time information (allowed running time is smaller than idle termination time). Or may also contain a fifth sub-area for storing the actual run-time of the virtualization layer settings, which information may be updated by the virtualization layer. Or may also contain a sixth sub-area for storing the start-time of the virtualization layer settings, which information may be updated by the virtualization layer. This information may be used for more accurate scheduling and prediction, and other designs may be used in practice, which is not limited in this embodiment.
The target process may predict the available idle time slices, e.g., the idle time information that the target process may predict while in idle state may specifically be the idle termination time. Alternatively, the idle time slice may be longer, and the determined idle termination time may be written directly to the first sub-region in the shared memory area. Or the target process can also write after a certain reduction, so that the processing considers that the bottom layer schedules the physical CPU according to the scheduling period, and the longer idle time slices are more significant than the scheduling period, so that the target process can temporarily store the predicted idle time slices in the process local. When the next time the target process runs again on the physical CPU, the new idle time slice can be redetermined according to the current busy-idle state and the temporarily stored idle time slice. For example, the current predicted free time slice is 10ms, which may be stored locally to the process, and the free termination time may be determined based on the current time and the predicted free time slice and written to the first sub-region. The underlying scheduler may read the idle time information stored in the first sub-area and store it locally and determine the time slices of the new virtual CPU. For example, the scheduling period based on the bottom layer may be suitably reduced, for example, 3ms. The next time the target process is rescheduled to run on the physical CPU, the target process may again determine the idle time information written into the first sub-area based on the current time. For example, when the virtual CPU where the target process is located runs on the physical CPU again, it is found that the target process is still in an idle state currently, and based on the current time and the locally stored idle termination time, it is found that the idle termination time has not yet been reached, the target process may not need to write idle time information again, and may also write new idle time information as needed. For the underlying scheduler, in the next scheduling period, the time slice of the new virtual CPU may be determined again based on the locally stored idle time information. For example, the current distance reaches 8ms from the idle termination time of the local storage, a new idle time slice is obtained directly to be 8ms, and then the idle time slice of the new virtual CPU is determined to be 3ms again.
For a physical CPU where a target process actively yields a time slice, when the physical CPU runs to a trap, the bottom layer scheduler can read idle time information which can be yielded through a first subarea of a shared storage area of a memory, for example, if a data value is stored in the first subarea of the shared storage area, the scheduler can read the idle time information and store the idle time information into a local memory space of a virtualized layer scheduler, and write indication information into a second subarea of the shared storage area so as to indicate that the idle time information is read through the indication information, and when the target process is switched to run next time, if the target process accesses the shared storage area, the idle time information is found to be read through the second subarea, and then the idle time information does not need to be written again when the idle termination time is reached currently. When either the idle time information stored in the first sub-area expires or the third sub-area has an emergency dispatch flag set, the information stored in the second sub-area is updated, e.g. cleared or set to a value indicating unread.
In the above embodiment, the virtualization layer stores the idle time information in the shared memory area locally to facilitate the local calculation of the actually allowed running time, because the value cannot be inconsistent before and after the calculation, and at the same time, in order not to affect the subsequent upper layer to continue updating, for example, the running is half, the idle is found to be not cleaned, or the prediction algorithm may be developed, the prediction is found to be inaccurate before the new value is updated, in fact, before using the idle time slice temporarily stored locally, the scheduler may check whether the future idle time stored in the shared memory area changes, and then perform the corresponding calculation. For example, comparing the idle time information stored locally with the future idle time stored in the shared memory area, if the idle time information is consistent with the future idle time stored in the shared memory area, directly calculating the time slice which can be allocated to the new virtual CPU this time, using the value stored locally in the calculation process, and if the idle time information is inconsistent with the future idle time stored in the shared memory area, updating the local value, and then calculating.
When the scheduler schedules the physical CPU to run the new virtual CPU, whether the idle time slice actively yielded at the moment is available or not can be determined through the read idle time information, and the idle time information can be converted into one deadline in the future, namely, the physical CPU can run to a certain moment in the future, namely, the idle termination moment. The idle termination time of writing the target process into the shared first sub-area may be different from deadline read by the scheduler and stored locally, for example, the scheduler may perform reduction based on some policies to obtain deadline based on the idle termination time determined by the target process.
The scheduler can calculate the current running time length of the current scheduling period according to deadline each time when the virtual processor is scheduled to run, and optionally, the running time length of each time can be set to be an upper limit, namely, the running time length cannot be larger than a certain threshold, the threshold can be the scheduling period, a time slice with the maximum current priority and the like, and the specific threshold can be set according to requirements.
When the virtual CPU runs to the trap, namely, when the virtual CPU stops running from the physical CPU, whether the virtual CPU overturns (namely, whether the allocated running time length is exceeded) or not can be checked, meanwhile, whether the consumed time slice reaches the maximum yielding idle time or not can be checked, and if the consumed time slice reaches the maximum yielding idle time, the virtual CPU can yield timely. Here, the maximum yield idle time may be smaller than the idle termination time deadline, and the maximum yield idle time may be stored in the aforementioned fourth sub-area.
When the virtual CPU is finished in operation, whether the actively yielded deadline is expired or not can be checked, if so, the bottom layer scheduler can clear the information stored in the second subarea, and when the target process predicts the idle time information, the target process finds that the information of the second subarea is cleared, and can clear or update the idle time information for the first subarea according to the busy idle state.
In addition, in order to avoid that the virtual CPU is excessively operated to influence the normal service operation of the target process, the total time of the active yield in a period of time is counted, and if the yield is excessively increased, the active yield is suspended for a period of time, so that service performance fluctuation caused by inaccurate prediction algorithm is avoided.
For the underlying scheduler, if an IPI or external interrupt sent to the target process is detected, the virtual CPU may be taken out of operation by issuing an interrupt instruction (e.g., a custom virtual interrupt) to the running virtual CPU, and then the physical processor is scheduled to resume running the target process. Meanwhile, an emergency dispatch mark can be set, the mark can be stored in a third subarea in the shared storage area, the information is written only at the bottom layer, the target process is read only, and the read-write competition is avoided. At the same time, the second sub-region may be cleared, indirectly such that after the target process is run, it is found that the third sub-region is written to the mark to clear the first sub-region. If information is written in the third subarea, after delaying a set threshold, the bottom layer scheduler can automatically empty the third subarea at the beginning of each scheduling period.
When the target process runs, if the emergency dispatch mark is identified, the emergency dispatch mark indicates that a sudden scene needing to be executed is met, the time slice yielding is stopped, the previously given predicted idle information is cleared, whether the yielding time is finished or not is judged, the target process can continue to run for a period of time until a certain time threshold is reached, no task needing to be executed is needed, and the time slice yielding can be restarted at the moment.
Alternatively, the method can also be used for emergency scheduling in other ways, for example, if the above way does not have a good relationship between the time slice yield and the emergency response, the method can also be used for informing the bottom layer scheduler to wake up the upper layer application in an emergency way by other ways, for example, reserving a user state interface, and modifying the relevant threshold value, thereby changing the time slice yield strategy, and further precipitating the preferred threshold value setting for different scenes. As examples, the relevant threshold is modified here, which may be a selection prediction algorithm, setting the maximum running time on a single physical CPU (i.e. the aforementioned information stored in the fourth sub-region, i.e. the proportion of time slices that a new virtual CPU can run per scheduling period), or setting the proportion of idle time slices occupied by all new virtual CPUs, etc.
As shown in fig. 3, there is shown a method for scheduling a processor of a physical machine according to an exemplary embodiment of the present disclosure, the physical machine including a physical processor, and a target process to which the method is applied, the method including:
In step 302, idle time information is predicted if it is determined to be in an idle state during execution on the physical processor.
In step 304, the idle time information is written into a first sub-area of a shared memory area in a memory, the shared memory area is used for sharing with a virtualization layer in a host operating system of the physical machine, the virtualization layer is used for switching the running host operating system from a root mode to a virtualization mode, after a virtual processor and a new virtual processor where the target process is located are created in the virtualization mode, the first sub-area in the shared memory area is read to obtain the idle time information, and based on the idle time information, the virtual processor where the target process in an idle state is located is withdrawn from the physical processor to run, and the physical processor is scheduled to run the new virtual processor.
The implementation process of this embodiment may refer to the description of the foregoing embodiment, and will not be described herein.
Accordingly, in the foregoing embodiment, please refer to fig. 2A, where the physical machine includes a physical processor and a host operating system running on the physical processor, and the virtualization layer and the target process may be used to execute the steps of the foregoing embodiment, respectively.
As one example, the virtualization layer may switch the running host operating system from root mode to virtualization mode when needed. For example, the physical machine may include a hardware resource including at least one physical computing resource object and a physical address space having a host physical address on which a host operating system is running, switching the running host operating system from a root mode to a virtualized mode including creating a memory page table for storing a mapping relationship between a guest physical address in the virtualized mode and a host physical address in the root mode, creating an information bearing object required for a mode switch for achieving synchronization of context information between the root mode and the virtualized mode, and switching the running host operating system from the root mode to the virtualized mode based on the information bearing object and the memory page table.
Optionally, the information bearing object required in the mode switching can be created, including creating a physical descriptor structure for any physical computing resource object, the physical descriptor structure being used for storing context information of any physical computing resource in a root mode in the mode switching, creating a virtualized descriptor structure for any physical computing resource object and initializing the virtualized descriptor structure, the virtualized descriptor structure being used for synchronizing context information in the physical descriptor structure in the mode switching, and creating a virtualized control structure for any physical computing resource object and initializing the virtualized control structure for storing running state information and running control information of any physical computing resource object in the virtualized mode.
Optionally, the running host operating system can be switched from a root mode to a virtualized mode based on the information bearing object and the memory page table, and the method comprises the steps of storing the context information of the target physical computing resource object in the root mode into a physical descriptor structure corresponding to the target physical computing resource object under the condition of mode switching of the target physical computing resource object, synchronizing the values of registers in the context information in the physical descriptor structure and the configuration information of segment registers into a virtualized descriptor structure corresponding to the target physical computing resource object and a virtualized control structure respectively, and controlling the running of the target physical computing resource object according to the virtualized descriptor structure, the virtualized control structure and the memory page table so as to switch the running host operating system from the root mode to the virtualized mode.
Further optionally, storing the context information of the target physical computing resource object in the root mode into a physical descriptor structure body corresponding to the target physical computing resource object includes storing values of registers in the context information and stack top and stack bottom addresses of a second stack into the physical descriptor structure body, wherein the second stack is a stack used by the target physical computing resource object in the root mode, and switching the second stack used by the target physical computing resource object in the root mode into a first stack corresponding to the target physical computing resource object in the virtualization mode.
Further alternatively, controlling operation of the target physical computing resource object according to the virtualized descriptor structure, the virtualized control structure and the memory page table comprises loading values of special registers in the virtualized control structure and the virtualized descriptor structure, injecting operation control information in the virtualized control structure into the target physical computing resource object and loading values of general registers in the virtualized descriptor structure, and executing a mode switching instruction to control the target physical computing resource object to enter a virtualized mode and operate from a first instruction, wherein memory management and access are performed based on the memory page table in the operation process.
Corresponding to the foregoing embodiments of the processor scheduling method of the physical machine, the present specification also provides embodiments of the processor scheduling apparatus of the physical machine and a computer device to which the processor scheduling apparatus is applied.
Embodiments of the processor scheduling apparatus of the physical machine of the present specification may be applied to a computer device, such as a server or a terminal device. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor where the device is located. In terms of hardware, as shown in fig. 4, a hardware structure diagram of a computer device where a processor scheduling apparatus of a physical machine in the present specification is located is shown in fig. 4, and in addition to a processor 410, a network interface 420, a memory 430, and a nonvolatile memory 440 shown in fig. 4, the computer device where the processor scheduling apparatus of the physical machine in the embodiment is located may generally include other hardware according to an actual function of the computer device, which is not described herein again.
As shown in fig. 5, which is a block diagram of a processor scheduling apparatus of a physical machine according to an exemplary embodiment of the present specification, the physical machine includes a physical processor, and the physical processor is bound with a target process, the apparatus includes:
a creation module 51 for switching the running host operating system from a root mode to a virtualized mode in which a virtual processor is created;
an obtaining module 52, configured to obtain idle time information predicted after the target process is in an idle state;
and the scheduling module 53 is configured to, based on the idle time information, exit the idle state of the target process from the physical processor, and schedule the physical processor to run the virtual processor.
As shown in fig. 6, which is a block diagram of a processor scheduling apparatus of a physical machine according to an exemplary embodiment of the present specification, the physical machine includes a physical processor, a target process is bound to the physical processor, the apparatus is applied to the target process, and the apparatus includes:
A prediction module 61, configured to predict idle time information if it is determined that the physical processor is in an idle state during the process of running on the physical processor;
The writing module 62 is configured to write the idle time information into a first sub-area of a shared storage area in a memory, where the shared storage area is used to be shared with a virtualization layer in a host operating system of the physical machine, so that the virtualization layer switches the running host operating system from a root mode to a virtualization mode, after a virtual processor is created in the virtualization mode, reads the first sub-area in the shared storage area to obtain the idle time information, and based on the idle time information, the target process in an idle state is taken out of running from the physical processor, and the physical processor is scheduled to run the virtual processor.
The implementation process of the functions and roles of each module in the processor scheduling device of the physical machine is specifically detailed in the implementation process of corresponding steps in the processor scheduling method of the physical machine, and is not described herein again.
Accordingly, the present description also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the processor scheduling method embodiment of the aforementioned physical machine.
Accordingly, the embodiments of the present disclosure further provide a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the processor scheduling method embodiment of the physical machine when the processor executes the program.
Accordingly, the present description also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the processor scheduling method embodiment of a physical machine.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present description. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The above-described embodiments may be applied to one or more computer devices, which are devices capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware of the computer devices include, but are not limited to, microprocessors, application SPECIFIC INTEGRATED Circuits (ASICs), programmable gate arrays (Field-Programmable GATE ARRAY, FPGA), digital processors (DIGITAL SIGNAL processors, DSPs), embedded devices, and the like.
The computer device may be any electronic product that can interact with a user in a human-computer manner, such as a Personal computer, a tablet computer, a smart phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a game console, an interactive internet protocol television (Internet Protocol Television, IPTV), a smart wearable device, etc.
The computer device may also include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group composed of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) composed of a large number of hosts or network servers.
The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and all the steps are within the scope of protection of the present patent, and adding insignificant modification to the algorithm or the process or introducing insignificant design, but not changing the core design of the algorithm and the process, and all the steps are within the scope of protection of the present application.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features of specific embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. On the other hand, the various features described in the individual embodiments may also be implemented separately in the various embodiments or in any suitable subcombination. Furthermore, although features may be acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Where a description of "a specific example", or "some examples", etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present description. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It is to be understood that the present description is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The foregoing description of the preferred embodiments is provided for the purpose of illustration only, and is not intended to limit the scope of the disclosure, since any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the disclosure are intended to be included within the scope of the disclosure.