HK1170876A - Entropy coder for image compression - Google Patents

Entropy coder for image compression Download PDF

Info

Publication number
HK1170876A
HK1170876A HK12111379.4A HK12111379A HK1170876A HK 1170876 A HK1170876 A HK 1170876A HK 12111379 A HK12111379 A HK 12111379A HK 1170876 A HK1170876 A HK 1170876A
Authority
HK
Hong Kong
Prior art keywords
data
encoded
computer
encoding
client
Prior art date
Application number
HK12111379.4A
Other languages
Chinese (zh)
Inventor
N.Y.阿布多
Original Assignee
微软技术许可有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 微软技术许可有限责任公司 filed Critical 微软技术许可有限责任公司
Publication of HK1170876A publication Critical patent/HK1170876A/en

Links

Description

Entropy encoder for image compression
Technical Field
The present invention relates to image compression, and in particular to an entropy encoder used in image compression.
Background
An increasingly popular form of network communication is commonly referred to as a telepresence system, which can share desktops on servers and other applications executing on servers with remote clients using, for example, the Remote Desktop Protocol (RDP) and Independent Computing Architecture (ICA). Such computing systems typically communicate keyboard presses and mouse clicks or selections from the client to the server, relaying back screen updates in the other direction over a network connection (e.g., the internet). Thus, when only screenshots of a desktop or application as appearing on the server side are actually sent to the client device, the user has the experience as if his or her machine is operating completely locally.
In a remote desktop environment, data representing graphics to be transmitted to a client is typically compressed by a server, transmitted from the server to the client over a network, and decompressed by the client and displayed on a local user display. The process of encoding data typically requires significant processor computation cycles to compress and decompress the data. Such processing requirements may have a direct impact on the encoding and decoding latencies from the server to the client and negatively impact the remote user experience.
Disclosure of Invention
One problem with telepresence systems is that such systems tend to favor data compression at the expense of processor performance. Many systems assume that bandwidth is more likely to be limited and thus sacrifice processor performance in order to achieve a higher level of data compression, thereby reducing the amount of data that needs to be transferred over a limited bandwidth data link. However, many telepresence clients today are low-end devices that may use lower speed processors but have access to sufficient bandwidth. In this case, even if it means a reduction in compression, a simpler compressor and less computational demanding compression technique can be used to improve overall performance and user experience.
In various embodiments, methods and systems for a fast entropy encoder/decoder used in real-time image compression are disclosed. For example, a method of processing graphics data for transmission to a remote computing device may include receiving graphics data representing a client screen to be rendered, receiving information indicative of available bandwidth for transmission, and determining based on the information that the available bandwidth satisfies a predetermined threshold, and entropy encoding the graphics data using a fixed-bit-size encoding stream, where run 0 is encoded in a variable number of units of the fixed bit-size, and encoding a literal value using one of an entry in a cache of recently used literal values or the variable number of units of the fixed bit-size.
Drawings
The systems, methods, and computer-readable media for processing image data for transmission to a remote computing device according to the present specification are further described with reference to the accompanying drawings in which:
the systems, methods, and computer-readable media for altering view perspectives within a virtual environment in accordance with the present specification are further described with reference to the accompanying drawings in which:
FIGS. 1 and 2 depict example computer systems in which aspects of the present invention may be implemented.
FIG. 3 depicts an operational environment for practicing aspects of the present disclosure.
FIG. 4 depicts an operational environment for practicing aspects of the present disclosure.
FIG. 5 illustrates a computer system including circuitry for implementing remote desktop services.
FIG. 6 illustrates a computer system including circuitry for implementing remote services.
Fig. 7 shows an example of a decoding process.
Fig. 8 shows an example of an encoding process.
FIG. 9 illustrates an example of an operational procedure for processing graphics data to be transmitted to a client computer.
FIG. 10 illustrates an example of an operational procedure for processing graphics data to be transmitted to a client computer.
FIG. 11 illustrates an example system for processing graphics data to be transmitted to a client computer.
Detailed Description
Generalized computing environment
Certain specific details are set forth in the following description and figures to provide a thorough understanding of various embodiments of the invention. Certain well-known details often associated with computing and software technology are not set forth in the following disclosure to avoid unnecessarily obscuring the various embodiments of the invention. Furthermore, those of ordinary skill in the relevant art will understand that they can practice other embodiments of the invention without one or more of the details described below. Finally, although the various methods are described in the following disclosure with reference to steps and sequences, the description as such is for providing a clear implementation of embodiments of the invention, and the steps and sequences of steps should not be taken as required to practice this invention.
It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs that may implement or utilize the processes described in connection with the present invention, e.g., through the use of an API, reusable controls, or the like. Such programs are preferably implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations.
A remote desktop system is a computer system that maintains applications that are remotely executable by a client computer system. The input is input at the client computer system and transmitted over a network (e.g., using a protocol based on the International Telecommunications Union (ITU) t.120 family of protocols, such as the Remote Desktop Protocol (RDP)) to an application on a terminal server. The application processes the input as if it were entered at the terminal server. The application generates output in response to the received input and communicates the output to the client over the network.
Embodiments may execute on one or more computers. FIGS. 1 and 2 and the following discussion are intended to provide a brief general description of a suitable computing environment in which the invention may be implemented. Those skilled in the art will appreciate that the computer systems 200, 300 may have some or all of the components described with reference to the computer 100 of fig. 1 and 2.
The term circuitry used throughout the invention may include hardware components such as hardware interrupt controllers, hard drives, network adapters, graphics processors, hardware-based video/audio codecs, and the firmware/software used to operate such hardware. The term circuitry may also include a microprocessor, or one or more logical processors, e.g., one or more cores of a multi-core general processing unit, configured to perform functions in a particular manner, either through firmware or through a set of switches. The logical processor in this example may be configured by software instructions embodying logic operable to perform functions loaded from memory, e.g., RAM, ROM, firmware, and/or virtual memory. In an example embodiment where circuitry includes a combination of hardware and software, an implementer may write source code embodying logic that is subsequently compiled into machine readable code that can be executed by a logical processor. Because those skilled in the art will appreciate that the state of the art has evolved to the point where there is little difference between hardware, software, or a combination of hardware/software, the selection of hardware versus software to implement functionality is merely a design choice. Thus, since one skilled in the art can appreciate that a software process can be transformed into an equivalent hardware structure, and a hardware structure itself can be transformed into an equivalent software process, the choice of a hardware implementation or a software implementation is trivial and left to the implementer.
FIG. 1 depicts an example of a computing system configured in accordance with aspects of the present invention. The computing system may include a computer 20 or the like, including a processing unit 21, a system memory 22, and a system bus 23 that couples various system components including the system memory to the processing unit 21. The system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory includes Read Only Memory (ROM)24 and Random Access Memory (RAM) 25. A basic input/output system 26(BIOS), containing the basic routines that help to transfer information between elements within the computer 20, such as during start-up, is stored in ROM 24. The computer 20 may also include a hard disk drive 27 for reading from and writing to a hard disk (not shown), a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD ROM or other optical media. In some example embodiments, computer executable instructions implementing aspects of the present invention may be stored in ROM 24, a hard disk (not shown), RAM25, a removable magnetic disk 29, an optical disk 31, and/or a cache memory of processing unit 21. The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical drive interface 34, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the computer 20. Although the environment described herein employs a hard disk, a removable magnetic disk 29, and a removable optical disk 31, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, Random Access Memories (RAMs), Read Only Memories (ROMs), and the like, may also be used in the operating environment.
A number of program modules may be stored on the hard disk, magnetic disk 29, optical disk 31, ROM 24, or RAM25, including an operating system 35, one or more application programs 36, other program modules 37, and program data 38. A user may enter commands and information into the computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or a Universal Serial Bus (USB). A display 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the display 47, computers typically include other peripheral output devices (not shown), such as speakers and printers. The system of FIG. 1 also includes a host adapter 55, Small Computer System Interface (SCSI) bus 56, and an external storage device 62 connected to the SCSI bus 56.
The computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49. The remote computer 49 may be another computer, a server, a router, a network PC, a peer device or other common network node, a virtual machine, and typically includes many or all of the elements described above relative to the computer 20, although only a memory storage device 50 has been illustrated in FIG. 1. The logical connections depicted in FIG. 1 can include a Local Area Network (LAN)51 and a Wide Area Network (WAN) 52. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
When used in a LAN networking environment, the computer 20 is connected to the LAN 51 through a network interface or adapter 53. When used in a WAN networking environment, the computer 20 typically includes a modem 54 or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, may be connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are examples and other means of establishing a communications link between the computers may be used. Furthermore, while it is contemplated that many embodiments of the present invention are particularly well-suited for use with a computer system, the present invention is not intended to be limited to the disclosure of such embodiments.
Referring now to FIG. 2, another embodiment of an exemplary computing system 100 is depicted. Computer system 100 may include a logical processor 102, such as an execution core. Although one logical processor 102 is shown, in other embodiments, the computer system 100 may have multiple logical processors, e.g., multiple execution cores per processor substrate, and/or multiple processor substrates that may each have multiple execution cores. As shown, the various computer-readable storage media 110 may be interconnected by one or more system buses that couple the various system components to the logical processor 102. The system bus may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. In an example embodiment, computer-readable storage media 110 may include, for example, Random Access Memory (RAM)104, storage 106 (e.g., an electromechanical hard drive, a solid state hard drive, etc.), firmware 108 (e.g., flash RAM or ROM), and removable storage 118 (e.g., a CD-ROM, a floppy disk, a DVD, a flash drive, an external storage device, etc.). It should be appreciated by those skilled in the art that other types of computer readable storage media can be used, such as magnetic cassettes, flash memory cards, digital video disks, bernoulli cartridges.
Computer-readable storage media provide non-volatile storage of processor-executable instructions 122, data structures, program modules, and other data for the computer 100. A basic input/output system (BIOS)120, containing the basic routines that help to transfer information between elements within computer system 100, such as during start-up, may be stored in firmware 108. A number of programs, including an operating system and/or application programs, may be stored on firmware 108, storage device 106, RAM 104, and/or removable storage device 118 and executed by logical processor 102.
Commands and information may be received by computer 100 through input devices 116 that may include, but are not limited to, a keyboard and a pointing device. Other input devices may include a microphone, joystick, game pad, scanner, or the like. These and other input devices are often connected to the logical processor 102 through a serial port interface that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port, or Universal Serial Bus (USB). A display or other type of display device is also connected to the system bus via an interface, such as a video adapter, which may be part of the graphics processor 112 or may be connected to the graphics processor 112. In addition to the display, computers typically include other peripheral output devices (not shown), such as speakers and printers. The exemplary system of FIG. 1 can also include a host adapter, Small Computer System Interface (SCSI) bus, and an external storage device connected to the SCSI bus.
The computer system 100 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer. The remote computer may be another computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer system 100.
When used in a LAN or WAN networking environment, computer system 100 can be connected to the LAN or WAN through network interface card 114. The NIC 114, which may be internal or external, may be connected to the system bus. In a networked environment, program modules depicted relative to the computer system 100, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections described are exemplary and other means of establishing a communications link between the computers may be used. Further, while it is contemplated that many embodiments of the present invention are particularly well-suited for computerized systems, nothing in this description is meant to limit the invention to those embodiments.
A remote desktop system is a computer system that maintains applications that are remotely executable by a client computer system. Input is input at a client computer system and transmitted over a network (e.g., using a protocol based on the International Telecommunications Union (ITU) t.120 family of protocols, such as the Remote Desktop Protocol (RDP)) to an application on a terminal server. The application processes the input as if it were entered at the terminal server. The application generates output in response to the received input, and the output is transmitted over a network to the client computer system. The client computer system renders the output data. Thus, input is received and output is presented at the client computer system, while processing actually occurs at the terminal server. A session may include a command line interface (shell) such as a desktop and a user interface, a subsystem that tracks mouse movements within the desktop, a subsystem that translates mouse clicks on a icon into commands that implement a program instance, and so forth. In another example embodiment, a session may include an application. In this example, the desktop environment may still be generated and hidden from the user when the application is presented. It should be appreciated that the foregoing discussion is exemplary and that the presently disclosed subject matter can be implemented in a variety of client/server environments and is not limited to a particular end service product.
In most, if not all, remote desktop environments, input data (entered at the client computer system) typically includes mouse and keyboard data representing commands to an application, and output data (generated by an application program at the terminal server) typically includes video data for display on a video output device. Many remote desktop environments also include functionality that extends to the transfer of other types of data.
The communication channel may be used to extend the RDP protocol by allowing the plug-in to transfer data over the RDP connection. Many such extensions exist. Features such as printer redirection, clipboard redirection, port redirection, etc., use communication channel techniques. Thus, there may be many communication channels required to transfer data in addition to the input and output data. Thus, there may be occasional requests to transmit outgoing data and one or more channel requests to transmit other data contending for available network bandwidth.
Referring now to fig. 3 and 4, depicted is a high-level block diagram of a computer system configured to implement a virtual machine. As shown, computer system 100 may include the elements described in FIGS. 1 and 2, as well as components that may be used to implement a virtual machine. One such component is a hypervisor (hypervisor)202, which may also be referred to in the art as a virtual machine monitor. Hypervisor 202 in the depicted embodiment can be configured to control and arbitrate access to the hardware of computer system 100. Broadly, hypervisor 202 can generate execution environments called partitions, such as child partition 1 through child partition N (where N is an integer greater than or equal to 1). In various embodiments, child partitions may be considered a basic unit of isolation supported by hypervisor 202, i.e., each child partition may be mapped to a set of hardware resources, e.g., memory, devices, logical processor cycles, etc., and/or parent partitions under the control of hypervisor 202, and hypervisor 202 may isolate one partition from accessing the resources of another partition. In various embodiments, hypervisor 202 can be a stand-alone software product, a part of an operating system, embedded within firmware of a motherboard, a specialized integrated circuit, or a combination thereof.
In the above example, computer system 100 includes parent partition 204, which may also be considered domain 0 in the open source community. The parent partition 204 may be configured to provide resources to guest operating systems executing in the child partitions 1-N by using virtualization service providers 228 (VSPs), also referred to as back-end drivers in the open source community. In this example architecture, parent partition 204 may gate access to the underlying hardware. VSPs 228 can be used to multiplex interfaces to hardware resources through virtualization service clients (VCSs), also referred to as front-end drivers in the open source community. Each child partition may include one or more virtual processors, such as virtual processors 230-232 on which guest operating systems 220-222 may manage and schedule threads to execute. Generally, the virtual processors 230 through 232 are executable instructions and associated state information that provide a representation of a physical processor with a particular architecture. For example, one virtual machine may have a virtual processor with characteristics of an Intel x86 processor, while another virtual processor may have the characteristics of a PowerPC processor. The virtual processors in this example may be mapped to logical processors of the computer system such that instructions implementing the virtual processors will be supported by the logical processors. As such, in these example embodiments, multiple virtual processors may be executing simultaneously while, for example, another logical processor is executing hypervisor instructions. In general, and as shown, the combination of virtual processors, various VCSs, and memory in a partition can be considered a virtual machine, such as virtual machine 240 or 242.
In general, guest operating systems 220 through 222 can include data structures such as, for example, fromOpen source community, etc. The guest operating system may include a user/kernel mode of operation and may have a kernel that may include a scheduler, memory manager, and the like. Kernel mode may include an execution mode in a logical processor that grants access to at least privileged processor instructions. Each guest operating system 220 through 222 may have an associated file system on which applications such as terminal servers, e-commerce servers, e-mail servers, etc., as well as the guest operating systems themselves, are stored. Guest operating systems 220 and 222 can schedule threads to execute on virtual processors 230 and 232 and can implement instances of such applications.
Referring now to FIG. 4, an alternative architecture that can be used to implement a virtual machine is shown. FIG. 4 depicts components similar to FIG. 3, but in this example embodiment hypervisor 202 can include virtualization service provider 228 and device driver 224, and parent partition 204 can contain configuration utility 236. In this architecture, hypervisor 202 may perform the same or similar functions as hypervisor 202 in FIG. 2. Hypervisor 202 of FIG. 4 can be a stand-alone software product, a part of an operating system, embedded within firmware of a motherboard or a portion of hypervisor 202 can be implemented by an application specific integrated circuit. In this example, parent partition 204 may have instructions available to configure hypervisor 202, however, the hardware access request may be handled by hypervisor 202 rather than passed to parent partition 204.
Referring now to FIG. 5, a computer 100 may include circuitry configured to provide remote desktop services to connected clients. In an example embodiment, the depicted operating system 400 can execute directly on hardware, or guest operating system 220 or 222 can be implemented by a virtual machine, such as VM 216 or VM 218. The underlying hardware 208, 210, 234, 212, and 214 is indicated in dashed lines of the type shown to identify that the hardware may be virtualized.
The remote service may be provided to at least one client, such as client 401 (although one client is depicted, the remote service may be provided to more clients). The example client 401 may comprise a computer terminal implemented by hardware configured to direct user input to a remote server session and display user interface information generated by the session. In another embodiment, client 401 may be implemented by a computer that includes similar elements as those in computer 100 of FIG. 1 b. In this embodiment, the client 401 may include circuitry configured to implement an operating system and circuitry configured to emulate the functionality of a terminal (e.g., a remote desktop client application executable by one or more logical processors 102). Those skilled in the art will appreciate that circuitry configured to implement an operating system may also include circuitry configured to emulate a terminal.
Each connected client may have a session (such as session 404) that allows the client to access data and applications stored on computer 100. In general, application programs and certain operating system components may be loaded into a region of memory allocated to a session. Thus, in some instances, some OS components may be spawned N times (where N represents the current number of sessions). These various OS components may request services from the operating system kernel 418, which is capable of managing memory, facilitating disk reads/writes, and configuring threads from each session to execute on the logical processor 102, for example. Some example subsystems that may be loaded into the session space may include a subsystem that generates a desktop environment, a subsystem that tracks mouse movements within a desktop, a subsystem that translates mouse clicks on a cursor into commands that implement a program instance, and so forth. The process implementing the services, e.g., tracking mouse movements, is tagged with an identifier associated with the session and loaded into a memory area allocated to the session.
The session may be generated by a session manager 416, such as a process. For example, session manager 416 may initialize and manage each remote session by: generating a session identifier for a session space; allocating memory to the session space; and generating instances of system environment variables and subsystem processes in memory allocated to the session space. Session manager 416 may be invoked when operating system 400 receives a request for a remote desktop session.
The connection request may first be processed by a transport stack 410, such as a Remote Desktop Protocol (RDP) stack. Transport stack 410 instructs configurable logical processor 102 to listen for connection messages on a particular port and forward these messages to session manager 416. When sessions are generated, transport stack 410 may instantiate a remote desktop protocol stack instance for each session. Stack instance 414 is an example stack instance that may be generated for session 404. In general, each remote desktop protocol stack instance may be configured to route output to an associated client and to route client input to the environment subsystem 444 for the appropriate remote session.
As shown, in an embodiment, applications 448 (although one is shown, others may be performed) may execute and generate a bit array. The array may be processed by a graphics interface 446, which in turn may present a bitmap, such as an array of pixel values, that may be stored in memory. As shown, a remote display subsystem 420 can be instantiated, which can capture the render call and send the call over a network to the client 401 via the stack instance 414 for the session.
In addition to remotely controlling graphics and audio, a plug-and-play redirector 458 may also be instantiated to remotely control different devices such as a printer, mp3 player, client file system, CD ROM drive, etc. Plug-and-play redirector 458 may receive information from a client-side component that identifies a peripheral device coupled to client 401. Plug-and-play redirector 458 may then configure operating system 400 to load redirection device drivers for the peripheral devices of client 401. The redirection device driver may receive a call from operating system 400 to access a peripheral device and send the call over a network to client 401.
As discussed above, the client may connect to the resource using the terminal service using a protocol for providing the remote presentation service, such as Remote Desktop Protocol (RDP). When a remote desktop client connects to a terminal server via a terminal server gateway, the gateway can open a socket connection with the terminal server and redirect client communications to a remote presentation port or a port dedicated to remote access services. The gateway may also perform a specific gateway private exchange with the client using a terminal server gateway protocol delivered over HTTPS.
Turning to FIG. 6, depicted is a computer system 100 that includes circuitry for implementing remote services and incorporating aspects of the present invention. As shown, in embodiments, computer system 100 may include components similar to those depicted in FIGS. 2 and 5, and may implement a remote presentation session. In one embodiment of the invention, a remote presentation session may include aspects of a console session, such as a session that is derived for a user using a computer system and a remote session. Similar to the above, session manager 416 may initialize and manage a remote presentation session by enabling/disabling components to implement the remote presentation session.
One set of components that may be loaded into a telepresence session is a high fidelity remoting enabled console component, i.e., a component that utilizes 3D graphics and 2D graphics rendered by 3D hardware.
The 3D/2D graphics rendered by the 3D hardware may be accessed using driver models including user mode drivers 522, APIs 520, graphics kernel 524, and kernel mode drivers 530. The application 448 (or any other process such as a user interface that generates 3D graphics) can generate and send an API construct, such as from Microsoft corporationAnd application programming interfaces 520 (APIs) such as Direct 3D. The API 520, in turn, may communicate with a user mode driver 522, which may generate primitives such as primitive geometries for use in computer graphics represented as vertices and constants, which are used as building blocks for other shapes, and store the primitives in a buffer such as a memory page. In one embodiment, the application 448 can declare how it will use the buffer, e.g., what type of data it will store in the buffer. An application, such as a video game, may use a dynamic buffer to store primitives for an avatar and a static buffer to store data that will change infrequently, such as representing buildings or forests.
Continuing with the description of the driver model, the application may fill the buffer with primitives and issue execution commands. When an application issues an execution command, the buffer may be appended to the run list by kernel mode driver 530 and scheduled by graphics kernel scheduler 528. Each graphical source, such as an application or user interface, may have a context and its own playlist. The graphics kernel 524 may be configured to schedule various contexts to execute on the graphics processing unit 112. The GPU scheduler 528 may be executed by the logical processor 102, and the scheduler 528 may issue commands to the kernel mode driver 530 to render the contents of the buffers. Stack instance 414 may be configured to receive commands and send the contents of the buffer to client 401 over a network, where the buffer may be processed by the GPU of the client 401.
Now shown is an example of the operation of a virtualized GPU used in conjunction with an application invoking a telepresence service. Referring to FIG. 6, in an embodiment, a virtual machine session may be generated by the computer 100. For example, the session manager 416 may be executed by the logical processor 102 and may initialize remote sessions that include particular remote components. In this example, the derived session may include the kernel 418, the graphics kernel 524, the user mode display driver 522, and the kernel mode display driver 530. User mode driver 522 may generate primitives that may be stored in memory. For example, the API 520 can include an interface that can be exposed to processes, such as a user interface, for the operating system 400 or the application programs 448. A process may send high-level API commands to the API 420, such as a Point List (Point List), a Line List (Line List), a Line Strip (Line trip), a Triangle List (Triangle List), a Triangle Strip (Triangle Strip), or a Triangle Fan (Triangle Fan). API 520 may receive these commands and convert them into commands for user mode driver 522, which user mode driver 522 may then generate and store the vertices in one or more buffers. The GPU scheduler 528 may run and determine the contents of the render buffer. In this example, commands to the server's graphics processing unit 112 may be captured and the contents (primitives) of the buffer may be sent to the client 401 via the network interface card 114. In one embodiment, the API may be exposed by the session manager 416 with which the components can interface to determine whether a virtual GPU is available.
In embodiments, a virtual machine, such as virtual machine 240 of fig. 3 or 4, may be instantiated, and the virtual machine may act as a platform for execution of operating system 400. In this example, guest operating system 220 can embody operating system 400. The virtual machine may be instantiated upon receiving a connection request over a network. For example, parent partition 204 may include an instance of transport stack 410 and may be configured to receive a connection request. Parent partition 204 may initialize the virtual machine in response to the connection request along with the guest operating system including the ability to implement the remote session. The connection request may then be passed to transport stack 410 of guest operating system 220. In this example, each remote session may be instantiated on an operating system executed by its own virtual machine.
In one embodiment, a virtual machine may be instantiated and may execute guest operating system 220 that materializes operating system 400. Similar to the above, a virtual machine may be instantiated when a connection request is received over a network. The remote session may be generated by the operating system. Session manager 416 may be configured to determine that the request is for a session that supports 3D graphics rendering, and session manager 416 may load a console session. In addition to loading a console session, the session manager 416 can load a stack instance 414' for the session and configure the system to capture primitives generated by the user-mode display driver 522.
User mode driver 522 may generate primitives that may be captured and stored in a buffer accessible to transport stack 410. The kernel mode driver 530 may append the buffer to the running list of the application and the GPU scheduler 528 may run and determine when to issue rendering commands to the buffer. When the scheduler 528 issues a rendering command, the command may be captured by, for example, the kernel mode driver 530 and sent to the client 401 via the stack instance 414'.
GPU scheduler 528 may execute and determine instructions to issue the contents of the render buffer. In this example, primitives associated with rendering instructions may be sent to client 401 via network interface card 114.
In an embodiment, at least one kernel-mode process may be executed by at least one logical processor 112, and at least one logical processor 112 may render vertices stored in different buffers simultaneously. For example, graphics processing scheduler 528, which may operate similar to an operating system scheduler, may schedule GPU operations. GPU scheduler 528 may merge the separate vertex buffers into the correct execution order so that the graphics processing units of client 401 execute the commands in an order that allows them to be rendered correctly.
One or more threads of a process, such as a video game, may map multiple buffers and each thread may issue a drawing command. Identification information for the vertices, such as information generated for each buffer, each vertex, or each batch of vertices in the buffer, may be sent to GPU scheduler 528. The information may be stored in a table with identification information associated with the vertices from the same or other processes and used to synchronize the renderings of the various buffers.
An application program, such as a word processing program, may execute and declare, for example, two buffers — one for storing vertices used to generate a 3D menu and the other for storing commands to generate letters that will populate the menu. The application may map the buffer and issue a draw command. The GPU scheduler 528 may determine the order in which to execute the two buffers so that the menu is rendered in a visually pleasing manner along with the letters. For example, other processes may issue draw commands at the same or substantially similar times, and if the vertices are not synchronized, the vertices from different threads of different processes may be rendered asynchronously on client 401, causing the final image displayed to appear chaotic or mixed.
A batch compressor 450 may be used to compress the primitives before sending the data stream to the client 401. In an embodiment, bulk compactor 450 may be a user mode (not shown) or kernel mode component of stack instance 414 and may be configured to look for similar patterns in the data stream sent to client 401. In this embodiment, because batch compressor 450 receives vertex streams from multiple applications rather than multiple API constructs, batch compressor 450 has a larger set of vertex data to filter to find opportunities for compression. That is, because vertices for multiple processes are remoted rather than different API calls, there is a greater chance that the bulk compactor 450 will be able to find similar patterns in a given stream.
In an embodiment, the graphics processing unit 112 may be configured to use virtual addressing instead of physical addresses for memory. Thus, memory pages used as buffers may be paged from video memory to system RAM or disk. The stack instance 414' may be configured to obtain a virtual address for the buffer and send content from the virtual address upon capturing a rendering command from the graphics kernel 528.
Operating system 400 may be configured, for example, to load various subsystems and drivers to capture and send primitives to a remote computer, such as client 401. Similar to the above, the session manager 416 may be executed by the logical processor 102 and a session including a particular remote component may be initialized. In this example, the derived session may include the kernel 418, the graphics kernel 524, the user mode display driver 522, and the kernel mode display driver 530.
The graphics kernel may schedule GPU operations. GPU scheduler 528 may merge the separate vertex buffers into the correct execution order so that the graphics processing units of client 401 execute the commands in an order that allows them to be rendered correctly.
Referring to fig. 7, a block diagram illustrating a decoding process is shown, according to one embodiment of the invention. The encoding process is shown in fig. 8. The encoded tile may first be passed through an RLGR decoder 900 to generate quantized tile coefficients. This can be performed on the CPU
Dequantization 705 may be implemented on the CPU using SSE2 instructions. After dequantization, the ten subbands of the three components of the tile may be copied into three Direct3D texture buffers of format L16, one for each of Y, U and V. These three textures may be uploaded onto the GPU and used as input by the inverse DWT stage 710.
All of these variations for implementing the above-mentioned partitions are merely exemplary implementations, and nothing herein should be construed as limiting the invention to any particular virtualization aspect.
Entropy coder
In a virtual desktop or remote presentation session, user graphics and video may be rendered at the server for each user. The resulting bitmap is then sent to the client for display and interaction. To reduce bandwidth requirements on the network, the bitmap may be compressed before being sent to the client. Compression techniques are expected to be efficient and have a short latency.
Described herein are systems and methods for encoding and decoding bitmaps and other graphics data. The encoding system may include a tiling system having a tiling module that initially partitions source image data into data tiles. The frame differencing module may then output only the modified data tiles to the respective processing modules, which convert the modified data tiles into corresponding tile components. In an embodiment, the quantizer may perform a compression process on the tile components according to an adjustable quantization parameter to generate compressed data. The adaptive entropy encoder selector may then select one of the plurality of entropy encoders to perform an entropy encoding process, thereby producing encoded data. The entropy encoder may also utilize a feedback loop to adjust the quantization parameter based on the current transmission bandwidth characteristics. The processes involved herein to compress, encode And decode graphics data are largely those described in commonly assigned U.S. patent 7,460,725 entitled "System And Method For efficiently Encoding And Decoding Electronic Information" And U.S. patent application 12/399,302 entitled "FrameCapture, Encoding, And Transmission Management," filed 3/6 of 2009, which is hereby incorporated by reference in its entirety.
In the various methods and systems disclosed herein, improvements to the processing and operation of the foregoing various processes may be used to provide more efficient processing, and thus a more timely and rich user experience. The methods and systems also provide improvements in providing such image support when network and/or system resources are providing sufficient bandwidth and/or client devices have lower processing speeds or resources. The embodiments disclosed herein for rendering, encoding, and transmitting graphics data may be implemented using various combinations of hardware and software processes. In some embodiments, the functions may be performed entirely in hardware. In other embodiments, the functions may be performed entirely in software. In still other embodiments, the functions may be implemented using a combination of hardware and software processes. These processes may also be implemented using one or more CPUs and/or one or more special purpose processors, such as a Graphics Processing Unit (GPU) or other special purpose graphics rendering device.
Further, while the following description is provided in the context of a telepresence system, it is to be understood that the disclosed embodiments may be implemented in any type of system in which graphical data is encoded and compressed for communication over a network.
Various embodiments may include the use of Discrete Wavelet Transform (DWT) functionality to transform individual YUV components of individual tiles into corresponding YUV tile sub-bands. The quantizer function may compress the tile sub-bands by performing a quantization process using an appropriate quantization technique. The quantizer function may generate compressed image data by reducing the bit rate of the tile according to a specific compression ratio that may be specified by adaptive quantization parameters received from the entropy encoder via a feedback loop.
In one embodiment, a bitmap may be provided to the GPU with changed rectangles that need to be compressed. The bitmap may be further divided into logical tiles and only tiles that vary within the changed rectangle are encoded and compressed. In this manner, the process effectively implements a caching scheme in cooperation with a client that maintains and displays the resulting decoded image.
A telepresence compression algorithm is employed to reduce the bandwidth of the display stream to a level that is acceptable for transmission over local area networks, wide area networks, and low bandwidth networks. Such algorithms typically trade off between server-side CPU time and lower desired bandwidth.
An image compressor may be used, which may employ a stage called entropy coder (phase). The entropy encoder function may perform an entropy encoding process to generate encoded data. In some embodiments, the entropy encoding process also reduces the bit rate of the compressed image data by replacing the corresponding bit pattern in the compressed image data received from the quantizer with an appropriate code.
Entropy coding employed in telepresence systems typically balances CPU performance (i.e., speed) with compression rate. The entropy coder can be adjusted for good compression at reasonable CPU speeds. Typical entropy coders include Run-Length (Run-Length), Huffman (Huffman), arithmetic, and variations of Golomb-Rice coders. One of the main problems in designing an efficient entropy coder for telepresence applications is that there is typically a large difference in the statistics of the integer blocks to be coded. Studies have shown that in most cases, the data before quantization has a probability distribution that is significantly more concentrated around 0 than a gaussian distribution. The present invention is directed to implementing a simplified entropy encoder configured to increase encoding and decoding speed at the potential cost of compressibility loss. However, this tradeoff is acceptable in many cases and is more desirable in scenarios that are limited by low speed CPUs rather than bandwidth. The end result is that encoders/decoders can be provided that are 2 or 3 times faster than current encoders/decoders with losses of 10% to 20% compressibility.
Such an encoder/decoder may be useful because it allows optimizing scenes where processor speed has a higher priority than saving each bit of bandwidth. For example, lower end client devices using simpler compressors may achieve better performance to allow faster processing. Telepresence systems are typically optimized to reduce bandwidth without regard to CPU cost and functionality. In many systems today, bandwidth may be plentiful, while client devices may be simpler devices such as set-top boxes or thin clients.
In one embodiment, the entropy encoder may be configured to avoid the use of a variable bitstream format. Efficient encoding and decoding with variable bit streams is always slower. In an embodiment, the encoder may be configured to implement a regular sized fixed size encoded stream using nibble sized (a.k.a. four bits) codes. By using such a stream, the stream can be decoded faster and safely (using full overflow checking) at much less CPU cost.
In one example approach, the encoder encodes the following types of operations:
1) run 0(a common input to entropy coders) -run 0 is coded with a variable number of fourfold that matches most statistical observations that run very short
2) Literal values-either coded as Least Recently Used (LRU) hits in a visible recent literal value table (cache) or as a variable number of quadruplets that benefit from statistical attributes where smaller values are more likely to occur.
In both cases, there are actually two flows: (a) a quaternary stream of opcodes and (b) a large value stream. Some opcodes of run length or literal length only indicate "get next value in large value stream".
Large value streams may be encoded using a basic multi-byte encoding scheme that uses fewer bytes for smaller values than for large values, but at the same time guarantees that only operation is on fixed byte boundaries. By using such a coding scheme, both the quad stream and the value stream may be decoded without complex bit-shifting or variable bit-decoding schemes, allowing for much faster performance than more general/complex entropy coders (e.g., RLGR or various huffman based schemes). Such a simplified encoder may be configured such that any number of bits (e.g., from 1 to 32 bits) may be encoded. In more complex coding, decoding becomes computationally challenging due to the variable bits and the necessary processing, which typically requires many coding branches and extensive bookkeeping during processing. Using this simplified scheme allows for minimizing such complexity by using conventionally sized structures (e.g., quads). In this scheme, no more than four bits are output symbols and the data is a plurality of bytes without shifting or scrolling. A byte may comprise two pieces of code that may be processed in parallel if desired. Furthermore, if the number of quadruples is known, buffer overflow can be avoided. In experiments using a typical telepresence system, a 2-3 fold increase in performance was measured using currently available CPUs with only a 10-20% loss of compressibility.
In some embodiments, logic may be provided for switching between a more complex/slow entropy encoder and the simpler entropy encoder described herein. For example, referring to fig. 9, a telepresence system may provide at least two encoders 910 and 920. The encoder 910 may be a complex entropy encoder such as one that implements RLGR. The encoder 920 may be a simplified encoder according to the present invention. Depending on the conditions of network 930, the system may select one of encoders 910 or 920 to encode data 900. For example, if network conditions indicate network congestion and available bandwidth is limited, a complex encoder 910 may be selected to encode the data 900, thereby minimizing the amount of data to be transmitted over the network 930. Similarly, if the network conditions indicate that the network is not congested, a simplified encoder 920 may be selected to encode the data 900, thereby providing faster processor performance at the client.
Annex a provides an example implementation of a simplified encoder according to the invention.
FIG. 10 depicts an exemplary operational procedure for processing graphics data for delivery to a client computer, including operations 1000, 1002, 1004, and 1006. Referring to FIG. 10, operation 1000 initiates the operational procedure and operation 1002 shows receiving graphical data representing a client screen associated with a virtual machine session. Operation 1004 shows receiving information indicating available bandwidth for the transmission and, based on the information, determining that the available bandwidth satisfies a predetermined threshold. Operation 1006 shows entropy encoding coefficients of the transformed graphics data using a compact stream of bit tokens aligned with byte boundaries. In an embodiment, run 0 is encoded with a variable number of multiples of the span size, the literal value is encoded using the entry in the cache of the most recently used literal value, and the other values are encoded using the minimum number of multiples of the span size. The bit token may be a bit string that defines a data unit. For example, in nibble-based systems, four-bit tokens are used.
In various embodiments, the quantum size may be a nibble. In some embodiments, the operational procedure may include generating a stream that entropy encodes the operation code and the large value stream. The process may also include entropy encoding the large value stream using a multi-byte encoding scheme that uses fewer bytes for small values than large values, dividing the graphics data into tiles of data, processing the tiles of data into tile components, and performing the entropy encoding on the tile components. The encoding scheme may be configured to run only on fixed byte boundaries.
FIG. 11 depicts an exemplary system for processing graphical data for delivery to a client computer as described above. Referring to fig. 11, system 1100 includes a processor 1110 and a memory 1120. The memory 1120 also includes computer instructions configured to process the graphics data transmitted to the remote computing device. Block 1122 illustrates receiving graphical data representing a client screen associated with a virtual machine session. Block 1124 illustrates dividing the graphics data into data tiles. Block 1126 shows entropy encoding coefficients of a transformed data tile using a stream of bit tokens aligned with byte boundaries.
Any of the above-mentioned aspects may be implemented as a method, system, computer-readable medium, or any type of article of manufacture. For example, a computer-readable medium may store thereon computer-executable instructions for processing graphics data to be transmitted to a client computer. Such a medium may include a first subset of instructions for receiving graphics data representing graphics data associated with a virtual machine session, and a second subset of instructions for entropy encoding the transformed graphics data using a compact stream of bit tokens aligned with byte boundaries such that the encoded data may be decoded using a byte-based decoding process. Those skilled in the art will appreciate that additional instruction sets may be used to capture various other aspects disclosed herein, and that the two presently disclosed subsets of instructions may differ in detail in accordance with the present invention.
The foregoing detailed description has set forth various embodiments of the systems and/or processes via examples and/or operational diagrams. To the extent that such block diagrams and/or examples contain one or more functions and/or operations, those skilled in the art will appreciate that each function and/or operation in such block diagrams or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof.
It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs that may implement or utilize the processes described in connection with the present invention, e.g., through the use of an API, reusable controls, or the like. Such programs are preferably implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations.
While the invention has been particularly shown and described with reference to a preferred embodiment thereof, it will be understood by those skilled in the art that various changes in form and detail may be made without departing from the scope of the invention as set forth in the following claims. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

Claims (10)

1. A method (1000) of processing graphics data for transmission to a remote computing device, the method comprising:
receiving graphical data representing a client screen associated with a virtual machine session (1002);
receiving information indicating available bandwidth for the transmission and, based on the information, determining that the available bandwidth meets a predetermined threshold (1004), an
Entropy encoding (1006) coefficients of the transformed graphics data using a compact stream of bit tokens forming groups aligned with byte boundaries, wherein:
run 0 is encoded in multiples of a variable number of span sizes (1000);
encoding (1000) a literal value using an entry in a cache of recently used literal values; and is
Other values are encoded (1000) using a minimum number of multiples of the range size.
2. The method of claim 1, further comprising generating a stream (1000) that entropy encodes the opcode and the large value stream.
3. The method of claim 2, further comprising entropy encoding (1000) the large value stream using a multi-byte encoding scheme that uses fewer bytes for small values than for large values.
4. The method of claim 3, wherein the encoding scheme is configured to run only on fixed byte boundaries (1000).
5. The method of claim 3, further comprising dividing the graphics data into data tiles (1000), processing the data tiles into tile components, and performing the entropy encoding on the tile components.
6. A system (1100) for processing graphics data for transmission to a remote computing device, comprising:
a computing device comprising at least one processor (1110);
a memory (1120) communicatively coupled to the processor (1110) when the system (1100) is run, the memory (1120) having stored therein computer instructions (1120) that, when executed by the at least one processor (1110), cause:
receiving graphical data representing a client screen associated with a virtual machine session (1122);
dividing the graphics data into data tiles (1124);
entropy encoding (1126) coefficients of the transformed data tile using a stream of bit tokens forming groups aligned with byte boundaries, wherein:
run 0 is encoded in multiples of a variable number of span sizes (1000);
encoding (1000) a literal value using an entry in a cache of recently used literal values; and
other values are encoded (1000) using a minimum number of units of the range size.
7. The system of claim 6, further comprising transmitting the encoded coefficients to a computing device (1100) configured to process the encoded coefficients based on the span size.
8. The system of claim 6, wherein the encoded data is available for efficient decoding (1000) by an entropy decoding process configured to operate on the encoded data on a per byte basis.
9. The system of claim 6, further comprising entropy encoding (1000) the large value stream using a multi-byte encoding scheme that uses fewer bytes for small values than for large values.
10. A computer-readable storage medium (110) having stored thereon computer-executable instructions (122) for processing graphics data for transmission to a client computer, the instructions for:
receiving graphical data representing a client screen associated with a virtual machine session (1002); and
entropy encoding (1006) coefficients of the transformed graphics data using a compact stream of bit tokens that constitute groups aligned with byte boundaries, such that the encoded data can be decoded using a byte-based decoding process, wherein:
run 0 is encoded in multiples of a variable number of nibbles (1000);
encoding (1000) a literal value using an entry in a cache of recently used literal values; and
other values are encoded using a multiple of the minimum number of nibbles (1000).
HK12111379.4A 2010-09-30 2012-11-09 Entropy coder for image compression HK1170876A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/894,793 2010-09-30

Publications (1)

Publication Number Publication Date
HK1170876A true HK1170876A (en) 2013-03-08

Family

ID=

Similar Documents

Publication Publication Date Title
AU2011314228B2 (en) Entropy coder for image compression
US9075560B2 (en) Real-time compression with GPU/CPU
US8760453B2 (en) Adaptive grid generation for improved caching and image classification
EP3261351B1 (en) Multimedia redirection method, device and system
US8638337B2 (en) Image frame buffer management
US10555010B2 (en) Network-enabled graphics processing module
US9146884B2 (en) Push pull adaptive capture
CN102959517B (en) For the system and method for the resource of the graphic process unit of Distribution Calculation machine system
US9235452B2 (en) Graphics remoting using augmentation data
US20100226441A1 (en) Frame Capture, Encoding, and Transmission Management
US20120075346A1 (en) Low Complexity Method For Motion Compensation Of DWT Based Systems
US10699361B2 (en) Method and apparatus for enhanced processing of three dimensional (3D) graphics data
TW200948088A (en) System and method for virtual 3D graphics acceleration and streaming multiple different video streams
US20100225655A1 (en) Concurrent Encoding/Decoding of Tiled Data
US8385666B2 (en) Classification and encoder selection based on content
US20140146063A1 (en) Command remoting
US20130060886A1 (en) Cross-Frame Progressive Spoiling Support for Reduced Network Bandwidth Usage
CN105262825A (en) SPICE cloud desktop transporting and displaying method and system on the basis of H.265 algorithm
EP2414959B1 (en) Image compression acceleration using multiple processors
WO2023011033A1 (en) Image processing method and apparatus, computer device and storage medium
US10237566B2 (en) Video decoding using point sprites
JP7768593B2 (en) Data processing method, device, computer equipment and computer program
HK1170876A (en) Entropy coder for image compression
RU2575679C2 (en) Entropy coder for image compression
CN116599942A (en) Image processing method, device, server and storage medium for a virtual desktop