JP2006514349A

JP2006514349A - Method and apparatus for determining processor state without interfering with processor operation

Info

Publication number: JP2006514349A
Application number: JP2004521558A
Authority: JP
Inventors: ジェイ．ウッドティモシー; エイ．ホワイトスコット
Original assignee: Advanced Micro Devices Inc
Current assignee: Advanced Micro Devices Inc
Priority date: 2002-07-11
Filing date: 2003-07-09
Publication date: 2006-04-27
Also published as: AU2003261128A8; TW200401194A; EP1576475A2; WO2004008319A3; CN1669004A; AU2003261128A1; WO2004008319A2

Abstract

ホストプロセッサの内部状態を判断するための方法及び装置である。テストデータはサービスプロセッサ140の出力ポート152にロードされる。サービスプロセッサ140はホストプロセッサ10に記録される有効ビットをポーリングする。有効ビットはクリアな状態であるとの判断後、サービスプロセッサ140はテストデータをホストプロセッサに送信し、且つ有効ビットをセットする。ステートデータはテストデータに応答して生成される。ステートデータはホストプロセッサ10の出力ポート104に書き込まれる。サービスプロセッサ140はホストプロセッサ10の出力ポート104からデータを受信する。ホストプロセッサ10の状態を判断する演算はホストプロセッサ10の命令の実行を妨げることなく実行される。A method and apparatus for determining the internal state of a host processor. Test data is loaded into the output port 152 of the service processor 140. The service processor 140 polls the valid bit recorded in the host processor 10. After determining that the valid bit is clear, the service processor 140 sends test data to the host processor and sets the valid bit. State data is generated in response to the test data. The state data is written to the output port 104 of the host processor 10. The service processor 140 receives data from the output port 104 of the host processor 10. The operation for determining the state of the host processor 10 is executed without hindering the execution of instructions of the host processor 10.

Description

本発明はプロセッサに関し、さらに具体的にはプロセッサの状態を判断するための方法及び装置に関するものである。 The present invention relates to processors, and more particularly to a method and apparatus for determining the state of a processor.

新しいコンピュータシステム及びシステムソフトウエアの設計時には様々な技術が用いられ、システムプロセッサにより命令が実行される間に様々なハードウエア機構がどのように動くか測定される。システムハードウエアの動作の測定形態の一つとして、プログラムを実行する様々な段階においてプロセッサの状態を判断することが挙げられる。プロセッサの状態を判断することとしてはプロセッサにデータを送ることによりプロセッサを照会することが挙げられる。プロセッサより受信されるデータには処理がなされ、またプロセッサの状態を示す追加データを生成する。そのようなデータには、レジスタやリザベーションステーションなどのコンテンツが含まれる。次いでこの追加データは観察のためにプロセッサから送信される。これに代えて、プロセッサはが部ソースへデータを定期的に送信するように構成されてもよく、データはプロセッサの状態あるいはその他の情報を判断するために用いられる。 Various techniques are used when designing new computer systems and system software to measure how various hardware mechanisms work while instructions are executed by the system processor. One form of measurement of system hardware operation is to determine the state of the processor at various stages of program execution. Determining the state of a processor includes querying the processor by sending data to the processor. Data received from the processor is processed, and additional data indicating the state of the processor is generated. Such data includes content such as registers and reservation stations. This additional data is then transmitted from the processor for observation. Alternatively, the processor may be configured to periodically send data to the department source, where the data is used to determine the status of the processor or other information.

プロセッサにクエリーを送り、プロセッサの状態を判断するために様々なツールが開発されている。これらのツールはテストデータがプロセッサに入力されること、ステートデータがテストデータを受信するプロセッサに応答して生成されること、ステートデータが観察のためにプロセッサから送信されることをそれぞれ許容する。そのようなツールでの１つの難点として、プロセッサを照会する間の命令の実行が挙げられる。典型的には、そのようなツールは、プロセッサの演算がプロセッサクエリの間に割り込みすなわちインタラプトの発生を要求する。プロセッサにより実行される命令ストリームはテストデータを入力するため、ステートデータを生成するため、及びステートデータを出力するために割り込みが必要となる。テストデータを入力すること及びステートデータを出力することは既存の入力及び出力ポートの使用を要求し、その結果プロセッサへの、あるいはプロセッサからのその他のデータの入力及び出力に割り込みが生じる。更に、テストデータの入力はプロセッサが別のサービスルーティンへ切り替える結果をもたらし、このことは現在実行されている命令ストリームを保留する結果をもたらす。これらの要因は特定のルーティンがプロセッサで実行する速度を確定することを困難にする。更に、別のサービスルーティンへの送信はプロセッサクエリから真のプロセッサの状態を判断できなくする結果をもたらす。 Various tools have been developed to query the processor and determine the state of the processor. These tools allow test data to be input to the processor, state data can be generated in response to the processor receiving the test data, and state data can be transmitted from the processor for observation. One difficulty with such tools is the execution of instructions while querying the processor. Typically, such tools require processor operations to generate interrupts or interrupts during processor queries. An instruction stream executed by the processor requires interrupts to input test data, generate state data, and output state data. Inputting test data and outputting state data requires the use of existing input and output ports, which results in interruptions in the input and output of other data to or from the processor. Furthermore, the input of test data results in the processor switching to another service routine, which results in deferring the currently executing instruction stream. These factors make it difficult to determine the speed at which a particular routine executes on the processor. Furthermore, transmission to another service routine results in the inability to determine the true processor state from the processor query.

ホストプロセッサの状態を判断するための方法及び装置が開示されている。一実施例では、サービスプロセッサは、ホストプロセッサのレシーバのレジスタに記録される有効ビットをポーリングする。有効ビットがクリアな状態であると判断されると、サービスプロセッサは出力レジスタにテストデータをロードし、ホストプロセッサへテストデータを送信し、送信が完了すると有効ビットをセットする。有効ビットがセットされたとホストプロセッサで判断されたことに応答して、レシーバのレジスタからテストデータが読み出される。テストデータに応答してホストプロセッサの状態を表すステートデータが生成される。ホストプロセッサは、その出力ポートのトランスミッタのレジスタの有効ビットをポーリングでき、また有効ビットはクリアな状態であるとの検出に応答してトランスミッタのレジスタにデータを保存する。次いでデータはサービスプロセッサのレシーバへ送信され、また有効ビットは既述の送信に応答してセットされる。ホストプロセッサへのデータ送信およびホストプロセッサからのデータ読み出しの演算は命令の実行を割り込むことなく実施される。更に、ホストプロセッサによるデータの送信及び受信はそれぞれ独立して発生する。 A method and apparatus for determining the status of a host processor is disclosed. In one embodiment, the service processor polls for valid bits recorded in the register of the host processor's receiver. If it is determined that the valid bit is clear, the service processor loads the test data into the output register, transmits the test data to the host processor, and sets the valid bit when the transmission is complete. In response to the host processor determining that the valid bit has been set, test data is read from the register of the receiver. State data representing the state of the host processor is generated in response to the test data. The host processor can poll the valid bit of the transmitter's register for its output port and saves data in the transmitter's register in response to detecting that the valid bit is clear. The data is then transmitted to the service processor receiver, and the valid bit is set in response to the previously described transmission. The operations of data transmission to the host processor and data reading from the host processor are performed without interrupting execution of instructions. Furthermore, transmission and reception of data by the host processor occur independently of each other.

一実施例では、ホストプロセッサは入力ポート及び出力ポートを備えている。ホストプロセッサの入力ポートはサービスプロセッサの出力ポートと結合されるために構成される。一方で出力ポートはサービスプロセッサの入力ポートと結合されるために構成される。両ホスト及びサービスプロセッサの入力ポート及び出力ポートはそれぞれ多くのビットを記録するように構成されるレジスタを含む。各レジスタはセットされる際に記録されるデータは有効であることを表す有効ビットを記録するように構成される。入力ポートのレジスタは有効ビットがクリアな状態であるときにだけデータを受信するように構成される。データが出力ポートから入力ポートへ送信される際に、有効ビットは入力ポートのレジスタにセットされる。両ホストプロセッサ及びサービスプロセッサは、有効ビットがセットされるとの検出に応答してそれぞれの入力ポートのレジスタから読み出されるように構成される。 In one embodiment, the host processor has an input port and an output port. The host processor input port is configured to be coupled to the service processor output port. On the other hand, the output port is configured to be coupled with the input port of the service processor. Both the host and service processor input and output ports each include a register configured to record a number of bits. Each register is configured to record a valid bit indicating that the data recorded when set is valid. The input port register is configured to receive data only when the valid bit is clear. When data is transmitted from the output port to the input port, the valid bit is set in the input port register. Both host processors and service processors are configured to be read from the registers of their respective input ports in response to detecting that the valid bit is set.

サービスプロセッサの一実施例ではデータは出力ポートのレジスタにロードされ、あるいはIEEE１１４９．１規格と一致する境界走査試験アクセスポート（バウンダリースキャンテストアクセスポート）（ＴＡＰ）を通って入力ポートから読み出される。データはテストデータ入力（ＴＤＩ）ピンを通ってサービスプロセッサの出力ポートのレジスタへ順次ロードされる。同様にデータはテストデータ出力（ＴＤＯ）ピンを通ってサービスプロセッサの出力ポートのレジスタから順次シフトされる。 In one embodiment of the service processor, data is loaded into an output port register or read from an input port through a boundary scan test access port (Boundary Scan Test Access Port) (TAP) consistent with the IEEE 1149.1 standard. Data is loaded sequentially through the test data input (TDI) pin into the service processor output port registers. Similarly, data is sequentially shifted from the service processor output port registers through the test data output (TDO) pin.

多くの実施例では、入力ポート及び出力ポートを組み合わせたオペレーションは、互いに別々に動作する。つまりサービスプロセッサの出力ポートは、ホストプロセッサの出力ポートからサービスプロセッサの入力ポートへのいずれのデータ送信からも独立してホストプロセッサの入力ポートへデータを送信する。更に異なる実施例ではデータはプロセッサへあるいはプロセッサから定期的に送信され、あるいはデータは個々の照会に応答してプロセッサから送信される。 In many embodiments, operations that combine input ports and output ports operate separately from each other. That is, the service processor output port transmits data to the host processor input port independently of any data transmission from the host processor output port to the service processor input port. In yet another embodiment, data is sent periodically to or from the processor, or data is sent from the processor in response to individual queries.

本発明は、様々な改良を行い、また、他の形態で実施することができるが、ここに説明されている特定の実施例は、例示として示さたものであり、以下にその詳細を記載する。しかし当然のことながら、ここに示した特定の実施例は、本発明を開示されている特定の形態に限定するものではなく、むしろ本発明は添付の請求項によって規定されている発明の範疇に属する全ての改良、等価物、及び変形例をカバーするものである。 While the invention is amenable to various modifications and alternative forms, specific embodiments described herein have been shown by way of example and are described in detail below. . It should be understood, however, that the particular embodiments shown are not intended to limit the invention to the particular form disclosed, but rather to fall within the scope of the invention as defined by the appended claims. Covers all improvements, equivalents, and variations to which it belongs.

図１に、プロセッサ１０の一実施例のブロック図が示されている。別の実施例も可能であり、検討される。図１に示されているように、プロセッサ１０はプリフェッチ／プレデコードユニット１２、分岐予測ユニット１４、命令キャッシュ１６、命令整合ユニット１８、複数のデコードユニット２０A−２０C、複数のリザベーションステーション２２A−２２C、複数のファンクショナルユニット２４A−２４C、ロード／ストアユニット２６、データキャッシュ２８、レジスタファイル３０、リオーダーバッファ３２、ＭＲＯＭユニット３４、及びバスインターフェースユニット３７を含む。文字の後に続く特定の参照番号を備えた、本明細書に参照される要素は単独の参照番号により集合的に参照される。例えばデコードユニット２０A−２０Cは集合的にデコードユニット２０として参照される。 A block diagram of one embodiment of the processor 10 is shown in FIG. Other embodiments are possible and are contemplated. As shown in FIG. 1, the processor 10 includes a prefetch / predecode unit 12, a branch prediction unit 14, an instruction cache 16, an instruction matching unit 18, a plurality of decode units 20A-20C, a plurality of reservation stations 22A-22C, It includes a plurality of functional units 24A-24C, a load / store unit 26, a data cache 28, a register file 30, a reorder buffer 32, an MROM unit 34, and a bus interface unit 37. Elements referred to herein with a particular reference number following the letter are collectively referred to by a single reference number. For example, the decode units 20A-20C are collectively referred to as the decode unit 20.

プリフェッチ／プレデコードユニット１２は、バスインターフェースユニット３７から命令を受信するために結合されている。また更にプリフェッチ／プレデコードユニット１２は命令キャッシュ１６及び分岐予測ユニット１４と結合される。同様に、分岐予測ユニット１４は命令キャッシュ１６と結合される。また更に分岐予測ユニット１４はデコードユニット２０及びファンクショナルユニット２４と結合される。命令キャッシュ１６は、更に、ＭＲＯＭユニット３４及び命令整合ユニット１８と結合される。命令整合ユニット１８は順にデコードユニット２０と結合される。各デコードユニット２０A−２０Cはロード／ストアユニット２６と、及びそれぞれのリザベーションステーション２２A−２２Cと結合される。リザベーションステーション２２A−２２Cは更にそれぞれのファンクショナルユニット２４A−２４Cと結合される。加えてデコードユニット２０及びリザベーションステーション２２はレジスタファイル３０及びリオーダーバッファ３２と結合される。ファンクショナルユニット２４もまたロード／ストアユニット２６、レジスタファイル３０、及びリオーダーバッファ３２と結合される。データキャッシュ２８はロード／ストアユニット２６と、及びバスインターフェースユニット３７と結合される。バスインターフェースユニット３７は更にL２キャッシュへのＬ２インターフェースとバスとに結合される。最後にＭＲＯＭユニット３４はデコードユニット２０と結合される。 Prefetch / predecode unit 12 is coupled to receive instructions from bus interface unit 37. Still further, the prefetch / predecode unit 12 is coupled to the instruction cache 16 and the branch prediction unit 14. Similarly, branch prediction unit 14 is coupled to instruction cache 16. Furthermore, the branch prediction unit 14 is coupled to the decode unit 20 and the functional unit 24. The instruction cache 16 is further coupled to the MROM unit 34 and the instruction matching unit 18. The instruction matching unit 18 is in turn coupled with the decode unit 20. Each decode unit 20A-20C is coupled to a load / store unit 26 and a respective reservation station 22A-22C. The reservation stations 22A-22C are further coupled to respective functional units 24A-24C. In addition, the decode unit 20 and the reservation station 22 are coupled with a register file 30 and a reorder buffer 32. The functional unit 24 is also coupled to the load / store unit 26, the register file 30, and the reorder buffer 32. Data cache 28 is coupled to load / store unit 26 and to bus interface unit 37. The bus interface unit 37 is further coupled to the L2 interface to the L2 cache and the bus. Finally, the MROM unit 34 is coupled to the decode unit 20.

命令キャッシュ１６は高速キャッシュメモリであり、命令を記録するために提供される。命令は命令キャッシュ１６から取り出されつまりフェッチされ、デコードユニット２０へ送られる。一実施例では命令キャッシュ１６は６４バイトライン（１バイトは８バイナリバイトより成る）を有する２ウエイセットアソシエイティブ構造の６４キロバイトを上限とする命令を記録するように構成される。あるいはその他いずれの所望の構造及びサイズが用いられてよい。例えば、命令キャッシュ１６はフルアソシエイティブ、セットアソシエイティブ、あるいはダイレクトマップ構造として実施されてよいことに留意されたい。 The instruction cache 16 is a high-speed cache memory and is provided for recording instructions. Instructions are fetched or fetched from instruction cache 16 and sent to decode unit 20. In one embodiment, instruction cache 16 is configured to record instructions up to 64 kilobytes in a two-way set associative structure having 64 byte lines (one byte is composed of 8 binary bytes). Alternatively, any other desired structure and size may be used. For example, note that the instruction cache 16 may be implemented as a fully associative, set associative, or direct map structure.

命令はプリフェッチ／プレデコードユニット１２により命令キャッシュ１６に記録される。命令はプリフェッチスキームにより、命令キャッシュ１６からその要求前にプリフェッチされる。プリフェッチ／プレデコードユニット１２により様々なプリフェッチスキームが用いられる。プリフェッチ／プレデコードユニット１２は命令キャッシュ１６に命令を送信するので、プリフェッチ／プレデコードユニット１２は命令に対応するプレデコードデータを生成する。例えば一実施例では、プリフェッチ／プレデコードユニット１２は命令の各バイトに対し３つのプレデコードビットを生成する。それらはスタートビット、エンドビット、及びファンクショナルビットである。プレデコードビットは各命令の境界を示すタグを形成する。プレコードタグはまた、所定の命令がデコードユニット２０により直接デコードされ得るかどうか、あるいは命令はＭＲＯＭユニット３４により制御されるマイクロコードプロシージャを呼出して実行されるかどうか、などの付加的な情報も送信する。更にまた、プリフェッチ／プレデコードユニット１２は分岐命令を検出するように構成され、また分岐命令に対応する分岐予測情報を分岐予測ユニット１４に記録するように構成される。その他の実施例は所望に応じていずれの適切なプレデコードスキームを用いてもよく、またプレデコードを用いなくてもよい。 Instructions are recorded in the instruction cache 16 by the prefetch / predecode unit 12. Instructions are prefetched from the instruction cache 16 prior to the request by a prefetch scheme. Various prefetch schemes are used by the prefetch / predecode unit 12. Since the prefetch / predecode unit 12 transmits an instruction to the instruction cache 16, the prefetch / predecode unit 12 generates predecode data corresponding to the instruction. For example, in one embodiment, prefetch / predecode unit 12 generates three predecode bits for each byte of the instruction. They are a start bit, an end bit, and a functional bit. The predecode bits form a tag indicating the boundary of each instruction. The prerecord tag also provides additional information, such as whether a given instruction can be decoded directly by the decode unit 20, or whether the instruction is executed by calling a microcode procedure controlled by the MROM unit 34. Send. Furthermore, the prefetch / predecode unit 12 is configured to detect a branch instruction and is configured to record branch prediction information corresponding to the branch instruction in the branch prediction unit 14. Other embodiments may use any suitable predecode scheme as desired, and may not use predecode.

可変バイト長の命令セットを用いるプロセッサ１０の実施例に対するプレデコードタグの符号化の１つが以下に解説される。可変バイト長の命令セットは異なる命令が異なるバイト数を占める命令セットである。プロセッサ１０の一実施例により用いられる典型的な可変バイト長命令セットはｘ８６命令セットである。 One encoding of the predecode tag for an embodiment of the processor 10 using a variable byte length instruction set is described below. A variable byte length instruction set is an instruction set in which different instructions occupy different numbers of bytes. A typical variable byte length instruction set used by one embodiment of processor 10 is the x86 instruction set.

典型的な符号化では、所定のバイトが命令の第１バイトの場合、そのバイトに対するスタートビットがセットされる。バイトが命令の最後のバイトの場合、そのバイトに対するエンドビットがセットされる。デコードユニット２０により直接デコードされる命令は“高速経路”命令として参照される。残りのｘ８６命令は一実施例によればＭＲＯＭ命令として参照される。高速経路命令に対しては、命令に含まれる各プレフィックスバイトに対してファンクショナルビットがセットされ、その他のバイトに対してはクリアな状態にされる。代わりに、ＭＲＯＭ命令に対しては、ファンクショナルビットは各プレフィックスバイトに対してクリアな状態にされ、その他のバイトに対してはセットされる。命令の種類はエンドバイトに対応するファンクショナルビットを調べることにより判断される。そのファンクショナルビットがクリアな状態の場合、命令は高速命令である。反対にそのファンクショナルビットがセットされる場合、命令はＭＲＯＭ命令である。命令のオペレーションコードはそれにより命令の第１のクリアファンクショナルビットに関連するバイトとしてデコードユニット２０により直接デコードされる命令に配置される。例えば２プレフィックスバイト、ＭｏｄＲ／Ｍバイト、イミディエイトバイト(immediate byte)を含む高速経路命令は以下のようなスタート、エンド、ファンクショナルビットを有する。
スタートビット１００００
エンドビット００００１
ファンクショナルビット１１０００ In typical encoding, if a given byte is the first byte of an instruction, the start bit for that byte is set. If the byte is the last byte of the instruction, the end bit for that byte is set. Instructions that are directly decoded by the decode unit 20 are referred to as “fast path” instructions. The remaining x86 instructions are referred to as MROM instructions according to one embodiment. For fast path instructions, the functional bit is set for each prefix byte included in the instruction and cleared for the other bytes. Instead, for MROM instructions, the functional bit is cleared for each prefix byte and set for the other bytes. The type of instruction is determined by examining the functional bit corresponding to the end byte. If the functional bit is clear, the instruction is a high speed instruction. Conversely, if the functional bit is set, the instruction is an MROM instruction. The instruction's operation code is thereby placed in the instruction that is directly decoded by the decode unit 20 as a byte associated with the first clear functional bit of the instruction. For example, a fast path instruction including two prefix bytes, ModR / M byte, immediate byte has the following start, end, and functional bits.
Start bit 10,000
End bit 00001
Functional bit 11000

ＭＲＯＭ命令は、デコードユニット２０によりデコードするには複雑すぎると判断される命令である。ＭＲＯＭ命令はMROMユニット３４を呼出すことにより実行される。更に具体的には、ＭＲＯＭ命令を受けると、MROMユニット３４は決められた高速経路命令のサブセットへ命令を解析し発行し、所望の演算を達成する。ＭＲＯＭユニット３４はデコードユニット２０へ高速経路命令のサブセットを送る。 The MROM instruction is an instruction that is determined to be too complicated to be decoded by the decoding unit 20. The MROM instruction is executed by calling the MROM unit 34. More specifically, upon receiving an MROM instruction, the MROM unit 34 parses and issues the instruction to a predetermined subset of fast path instructions to achieve the desired operation. MROM unit 34 sends a subset of fast path instructions to decode unit 20.

プロセッサ１０は、分岐予測を用いて条件付分岐命令に続いて投機的に命令をフェッチする。分岐予測ユニット１４は分岐予測演算を実行するために含まれる。一実施例では、分岐予測ユニット１４は分岐ターゲットバッファを用いる。分岐ターゲットバッファは、命令キャッシュ１６に、キャッシュラインの１６バイト部ごとに、２つを上限とする分岐ターゲットアドレス及びそれに対応した採用された／採用されなかった予測をキャッシュする。分岐ターゲットバッファは例えば２０４８のエントリあるいはその他の適切な数のエントリを有す。プリフェッチ／プレコードユニット１２は特定の行がプレデコードされる際、初期の分岐ターゲットを判断する。続いてキャッシュラインと対応する分岐ターゲットへのアップデートが、キャッシュラインの命令を実行することにより発生する。命令キャッシュ１６は、フェッチされている命令アドレスの指示を与えて、分岐予測ユニット１４が分岐予測を形成するためにどの分岐ターゲットアドレスを選択するかを決定できるようにする。デコードユニット２０及びファンクショナルユニット２４は分岐予測ユニット１４へアップデート情報を送る。デコードユニット２０は分岐予測ユニット１４により予測されなかった分岐命令を検出する。ファンクショナルユニット２４は分岐命令を実行し、予測された分岐方向が間違っているかどうかを判断する。分岐方向が“採用され”た場合、続く命令は分岐命令のターゲットアドレスからフェッチされる。反対に、分岐方向は“採用されない”場合、次の命令は分岐命令に続いてメモリ位置からフェッチされる。誤予測分岐命令が検出されると、誤予測分岐に続く命令はプロセッサ１０の様々なユニットから破棄される。他の構成として、分岐予測ユニット１４はデコードユニット２０及びファンクショナルユニット２４の代わりにリオーダーバッファ３２と結合され、且つ、リオーダーバッファ３２から分岐誤予測情報を受信する。様々な適切な分岐予測アルゴリズムは分岐予測ユニット１４により用いられる。 The processor 10 speculatively fetches instructions following conditional branch instructions using branch prediction. A branch prediction unit 14 is included to perform branch prediction operations. In one embodiment, branch prediction unit 14 uses a branch target buffer. The branch target buffer caches in the instruction cache 16 for every 16 byte part of the cache line, up to two branch target addresses and corresponding adopted / not adopted predictions. The branch target buffer has, for example, 2048 entries or other suitable number of entries. The prefetch / precode unit 12 determines the initial branch target when a particular row is predecoded. Subsequently, an update to the branch target corresponding to the cache line occurs by executing the instruction of the cache line. Instruction cache 16 provides an indication of the instruction address being fetched so that branch prediction unit 14 can determine which branch target address to select to form the branch prediction. The decode unit 20 and the functional unit 24 send update information to the branch prediction unit 14. The decode unit 20 detects a branch instruction that has not been predicted by the branch prediction unit 14. The functional unit 24 executes the branch instruction and determines whether the predicted branch direction is incorrect. If the branch direction is “adopted”, the following instruction is fetched from the target address of the branch instruction. Conversely, if the branch direction is “not employed”, the next instruction is fetched from the memory location following the branch instruction. When a mispredicted branch instruction is detected, the instruction following the mispredicted branch is discarded from various units of the processor 10. As another configuration, the branch prediction unit 14 is coupled to the reorder buffer 32 instead of the decode unit 20 and the functional unit 24 and receives branch misprediction information from the reorder buffer 32. Various suitable branch prediction algorithms are used by the branch prediction unit 14.

命令キャッシュ１６からフェッチされる命令は、命令整合ユニット１８に伝達される。命令は命令キャッシュ１６からフェッチされるので、対応するプレデコードデータは走査され、フェッチされる命令に関し命令整合ユニット１８（及びＭＲＯＭユニット３４）へ情報が送られる。命令整合ユニット１８は走査データを利用し、各デコードユニット２０へ命令を整合する。一実施例では、命令整合ユニット１８は３セットの８命令バイトからデコードユニット２０へ命令を整合する。デコードユニット２０Aはデコードユニット２０B及び２０C（プログラム順序で）により同時に受信される命令よりも前の命令を受信する。同様にデコードユニット２０Bはプログラム順序でデコードユニット２０Cにより同時に受信される命令よりも前の命令を受信する。いくつかの実施例では（例：固定長の命令セットを用いた実施例）、命令整合ユニット１８はなくされる。 Instructions fetched from the instruction cache 16 are communicated to the instruction matching unit 18. As instructions are fetched from the instruction cache 16, the corresponding predecode data is scanned and information is sent to the instruction matching unit 18 (and MROM unit 34) regarding the fetched instructions. The instruction matching unit 18 uses the scan data to match instructions to each decode unit 20. In one embodiment, instruction matching unit 18 aligns instructions from three sets of 8 instruction bytes to decode unit 20. Decode unit 20A receives instructions prior to instructions received simultaneously by decode units 20B and 20C (in program order). Similarly, decode unit 20B receives an instruction prior to an instruction received simultaneously by decode unit 20C in program order. In some embodiments (eg, an embodiment using a fixed length instruction set), the instruction matching unit 18 is eliminated.

デコードユニット２０は命令整合ユニット１８から受信される命令をデコードするように構成される。レジスタオペランド情報は検出され、レジスタファイル３０及びリオーダーバッファ３２へ送られる。加えて、１つ以上のメモリ演算の実施を命令が要求する場合、デコードユニット２０は、メモリ演算をロード／ストアユニット２６へ送る。各命令はファンクショナルユニット２４に対する一連のコントロール値にデコードされ、またこれらのコントロール値は命令に含まれる置換えあるいは即値データとオペランドアドレス情報とともにリザベーションステーション２２へ送られる。１つの具体的な実施例では、各命令はファンクショナルユニット２４Ａ−２４Ｃにより別々に実施される２つを上限とする演算にデコードされる。 The decode unit 20 is configured to decode instructions received from the instruction matching unit 18. Register operand information is detected and sent to the register file 30 and the reorder buffer 32. In addition, if the instruction requires one or more memory operations to be performed, the decode unit 20 sends the memory operations to the load / store unit 26. Each instruction is decoded into a series of control values for the functional unit 24, and these control values are sent to the reservation station 22 along with replacement or immediate data and operand address information contained in the instruction. In one specific embodiment, each instruction is decoded into two upper bound operations performed separately by the functional units 24A-24C.

プロセッサ１０は、アウトオブオーダー実行をサポートする。従ってリオーダーバッファ３２を用いてレジスタリード及びライト演算に対しオリジナルプログラムのシーケンスの経過を追い、レジスタリネームを実施し、投機的な命令の実行と分岐誤予測回復を許容し、“正確な”例外（precise exceptions）となるよう促進する。リオーダーバッファ３２内の一時的な記録場所はレジスタのアップデートを含む命令をデコード後にリザーブされ、その結果投機的なレジスタの状態を記録する。分岐予測が間違いの場合、誤予測経路に沿った投機的に実行される命令の結果はレジスタファイル３０に書き込まれる前にバッフアにて無効にされうる。同様に、特定の命令が例外を引き起こす場合、特定の命令に続く命令は破棄される。このような方法で例外は“正確”である（すなわち、例外を引き起こす特定の命令に続く命令は例外の前には完了されない）。ここで留意すべきは、特定の命令というのはプログラムの順序で特定の命令に先行する命令の前に実行される場合は投機的に実効されることである。先行する命令は分岐命令あるいは例外を引き起こす命令であり、いずれの場合にも投機的な結果はリオーダーバッファ３２により破棄される。 The processor 10 supports out-of-order execution. Therefore, the reorder buffer 32 is used to keep track of the original program sequence for register read and write operations, register renaming, speculative instruction execution and branch misprediction recovery, and "exact" exceptions. (Precise exceptions) The temporary recording location in the reorder buffer 32 is reserved after decoding an instruction including a register update, thereby recording the speculative register state. If the branch prediction is incorrect, the result of the speculatively executed instruction along the mispredicted path can be invalidated in the buffer before it is written to the register file 30. Similarly, if a particular instruction causes an exception, the instruction following the particular instruction is discarded. In this way the exception is “exact” (ie, the instruction following the specific instruction that causes the exception is not completed before the exception). It should be noted here that a specific instruction is speculatively executed if executed before the instruction preceding the specific instruction in program order. The preceding instruction is a branch instruction or an instruction that causes an exception. In either case, the speculative result is discarded by the reorder buffer 32.

デコードユニット２０の出力において送られるデコードされる命令はそれぞれのリザベーションステーション２２へ直接送られる。一実施例では各リザベーションステーション２２は対応するファンクショナルユニットへの発行を待つ上限６つの保留命令に対し命令情報（例：オペランド値、オペランドタグ及び／あるいは即値データと同様、デコードされる命令）を保持することができる。ここで留意すべきは、図１の実施例に対して、各リザベーションステーション２２は専用のファンクショナルユニットと関連付けられている点である。したがって３つの専用”発行部（issue portion）”、はリザベーションステーション２２及びファンクショナルユニット２４により形成される。つまり、発行部０はリザベーションステーション２２Ａ及びファンクショナルユニット２４Ａにより形成される。リザベーションステーション２２Ａへ整合され、また送られる命令はファンクショナルユニット２４Ａにより実行される。同様に、発行部１はリザベーションステーション２２Ｂとファンクショナルユニット２４Ｂにより形成される。発行部２はリザベーションステーション２２Ｃとファンクショナルユニット２４Ｃにより形成される。 The decoded instruction sent at the output of the decode unit 20 is sent directly to the respective reservation station 22. In one embodiment, each reservation station 22 provides instruction information (eg, decoded instructions as well as operand values, operand tags, and / or immediate data) for up to six pending instructions waiting to be issued to the corresponding functional unit. Can be held. Note that, for the embodiment of FIG. 1, each reservation station 22 is associated with a dedicated functional unit. Thus, three dedicated “issue portions” are formed by the reservation station 22 and the functional unit 24. That is, the issuing unit 0 is formed by the reservation station 22A and the functional unit 24A. The commands that are aligned and sent to the reservation station 22A are executed by the functional unit 24A. Similarly, the issuing unit 1 is formed by a reservation station 22B and a functional unit 24B. The issuing unit 2 is formed by a reservation station 22C and a functional unit 24C.

特定の命令をデコード後、所望のオペランドがレジスタ位置にある場合、レジスタアドレス情報はリオーダーバッファ３２とレジスタファイル３０へ同時に送られる。レジスタファイル３０はプロセッサ１０により実施される命令セットに含まれる各々の設計されたレジスタに対し記録場所を有す。追加の記録場所はＭＲＯＭユニット３４により用いられるためにレジスタファイル３０に含まれる。リオーダーバッファ３２は、これらのレジスタのコンテントを変更する結果に対し一時的な記録場所を有し、その結果アウトオブオーダー実行を許容する。リオーダーバッファ３２の一時的な記録場所は各命令に対しリザーブされ、各命令はデコード後に実レジスタの１つのコンテントを修正するために判断される。従って、特定のプログラムを実行する間、様々なポイントでは、リオーダーバッファ３２は１つ以上の場所を有し、それらは所定のレジスタの投機的に実施されるコンテントを含む。所定の命令のデコードに続いて、リオーダーバッファ３２が前の場所あるいは所定の命令でオペランドとして用いられるレジスタに割振られた場所を有すると判断される場合、リオーダーバッファ３２は１）直前に割振られた場所の値、あるいは２）最終的に前の命令を実行するファンクショナルユニットにより値が生成されていない場合に、直前に割振られた場所に対するタグ、のどちらか一方を対応のリザベーションステーションへフォーワードする。リオーダーバッファ３２が所定のレジスタに対しリザーブされる場所を有する場合、オペランド値（あるいはリオーダーバッファタグ）がレジスタファイル３０からではなくリオーダーバッファ３２から送られる。リオーダーバッファ３２にて要求されるレジスタに対しリザーブされる場所を有さない場合、値はレジスタファイル３０から直接取られる。オペランドがメモリ位置に対応する場合、オペランド値はロード／ストアユニット２６を通ってリザベーションステーションに送られる。 After decoding a particular instruction, if the desired operand is in a register location, register address information is sent to reorder buffer 32 and register file 30 simultaneously. Register file 30 has a record location for each designed register included in the instruction set implemented by processor 10. Additional recording locations are included in the register file 30 for use by the MROM unit 34. The reorder buffer 32 has a temporary recording location for the result of changing the contents of these registers, thereby allowing out-of-order execution. The temporary recording location of the reorder buffer 32 is reserved for each instruction, and each instruction is determined to modify one content of the real register after decoding. Thus, during execution of a particular program, at various points, the reorder buffer 32 has one or more locations that contain speculatively implemented content of a given register. Following the decoding of a given instruction, if it is determined that the reorder buffer 32 has a previous location or a location allocated to a register used as an operand in a given instruction, the reorder buffer 32 1) allocates immediately before Either the value of the assigned location, or 2) the tag for the location allocated immediately before, if no value has been generated by the functional unit that ultimately executes the previous instruction, to the corresponding reservation station Forward. If the reorder buffer 32 has a place to be reserved for a given register, the operand value (or reorder buffer tag) is sent from the reorder buffer 32 instead of from the register file 30. The value is taken directly from the register file 30 if there is no reserved location for the requested register in the reorder buffer 32. If the operand corresponds to a memory location, the operand value is sent through the load / store unit 26 to the reservation station.

１つの具体的な実施例では、リオーダーバッファ３２はユニットとして同時にデコードされる命令を記録し処理するように構成される。この構成は“行型”として本明細書に参照される。いくつもの命令を一緒に処理することにより、リオーダーバッファ３２に用いられるハードウエアを単純化することができる。例えば本実施例に含まれる行型のリオーダーバッファは１つ以上の命令がデコードユニット２０により送られるときは、いつでも３つの命令（各デコードユニット２０から１つ）に関する命令情報に対し十分な記録を割振る。それに対し、可変長の記録は、実際に送られる命令の数に応じて従来のリオーダーバッファに割振られる。相対的に大きな数の論理ゲートが可変長の記録を割振るために必要とされる。同時にデコードされる命令の各々が実行される場合、命令の結果はレジスタファイル３０に同時に記録される。次いで記録は別の一連の同時にデコードされる命令へ自由に割振ることができる。加えて命令ごとに用いられる制御論理回路の総数は、制御論理が様々な同時にデコードされる命令上に償却されるため、低減される。特定の命令を識別するデコーダーバッファタグは二つのフィールドに分割される。ラインタグとオフセットタグである。ラインタグは特定の命令を含む一連の同時にデコードされる命令を識別し、またオフセットタグはセットのどの命令が特定の命令と対応するかを識別する。留意すべきは命令を記録することはレジスタファイル３０をもたらし、対応の記録を自由にすることは命令を”リタイアする”として参照されることである。更に留意すべきはいずれのリオーダーバッファ構成もプロセッサ１０の様々な実施例において用いられることである。 In one specific embodiment, reorder buffer 32 is configured to record and process instructions that are simultaneously decoded as a unit. This configuration is referred to herein as “row type”. By processing several instructions together, the hardware used for the reorder buffer 32 can be simplified. For example, the row-type reorder buffer included in this embodiment provides sufficient recording for instruction information regarding three instructions (one from each decode unit 20) whenever one or more instructions are sent by the decode unit 20. Allocate. On the other hand, variable length records are allocated to the conventional reorder buffer according to the number of instructions actually sent. A relatively large number of logic gates are required to allocate variable length records. If each of the instructions decoded at the same time is executed, the result of the instruction is recorded in the register file 30 simultaneously. The record can then be freely allocated to another series of simultaneously decoded instructions. In addition, the total number of control logic used per instruction is reduced because the control logic is amortized on the various simultaneously decoded instructions. The decoder buffer tag that identifies a particular instruction is divided into two fields. Line tag and offset tag. The line tag identifies a series of simultaneously decoded instructions that include a particular instruction, and the offset tag identifies which instruction in the set corresponds to a particular instruction. It should be noted that recording an instruction results in register file 30, and freeing the corresponding recording is referred to as "retire" the instruction. It should further be noted that any reorder buffer configuration may be used in various embodiments of the processor 10.

先に述べたように、リザベーションステーション２２は対応のファンクショナルユニット２４により命令が実行されるまで命令を記録する。命令は（ｉ）命令のオペランドが提供されている場合；及び（ｉｉ）オペランドが同じリザベーションステーション２２Ａ−２２Ｃにあるとともにプログラム順序で命令の前にある命令が提供されていない場合、実行するよう選択される。命令がファンクショナルユニット２４の１つにより実行される場合、その命令の結果はその結果を待ついずれのリザベーションステーション２２へ直接パスされ、同時に結果はリオーダーバッファ３２をアップデートするためにパスされる（この技術は、一般にリザルトフォワーディング（result forwarding）と称される）。命令は実行するために選択され、クロックサイクルの間、関連する結果がフォーワードされるファンクショナルユニット２４Ａ−２４Ｃへパスされる。リザベーションステーション２２はこの場合、フォワードされる結果をファンクショナルユニット２４へ送る。命令が、ファンクショナルユニット２４により実行される様々な演算にデコードされるという実施例では、演算はそれぞれ別々にスケジュールされる。 As stated above, the reservation station 22 records instructions until the instructions are executed by the corresponding functional unit 24. The instruction is selected to execute if (i) the operand of the instruction is provided; and (ii) if the operand is in the same reservation station 22A-22C and the instruction preceding the instruction in program order is not provided Is done. When an instruction is executed by one of the functional units 24, the result of the instruction is passed directly to any reservation station 22 waiting for the result, while the result is passed to update the reorder buffer 32 ( This technique is commonly referred to as result forwarding). Instructions are selected for execution and passed to functional units 24A-24C where the associated results are forwarded during the clock cycle. The reservation station 22 in this case sends the forwarded result to the functional unit 24. In embodiments where the instructions are decoded into various operations to be performed by the functional unit 24, each operation is scheduled separately.

一実施例では、各ファンクショナルユニット２４はシフト、送信、論理演算、ブランチ演算と同様に、加算及び引き算の整数算術演算を実施するように構成される。演算はデコードユニット２０により特定の命令に対しデコードされるコントロール値に応答して実施される。浮動小数点ユニット（図示せず）もまた用いられ、浮動小数点演算に適応される。浮動小数点ユニットはＭＲＯＭユニット３４あるいはリオーダーバッファ３２から命令を受信し、その後リオーダーバッファ３２と通信し命令を完了するコプロセッサとして、オペレーションがなされる。加えて、ファンクショナルユニット２４はロード／ストアユニット２６により実施されるロード及びストアメモリ演算に対しアドレス生成を実施するように構成される。１つの特定の実施例では、各ファンクショナルユニット２４はアドレスを生成するためのアドレス生成ユニット及び残りのファンクションを実施するための実行ユニットを有す。２つのユニットはクロックサイクルの間、異なる命令あるいは演算から独立してオペレーションがなされる。 In one embodiment, each functional unit 24 is configured to perform integer arithmetic operations of addition and subtraction, as well as shift, transmission, logical operations, and branch operations. The operation is performed in response to a control value decoded by the decode unit 20 for a specific instruction. A floating point unit (not shown) is also used and adapted for floating point operations. The floating point unit operates as a coprocessor that receives instructions from the MROM unit 34 or reorder buffer 32 and then communicates with the reorder buffer 32 to complete the instructions. In addition, the functional unit 24 is configured to perform address generation for load and store memory operations performed by the load / store unit 26. In one particular embodiment, each functional unit 24 has an address generation unit for generating addresses and an execution unit for performing the remaining functions. The two units are operated independently of different instructions or operations during the clock cycle.

各ファンクショナルユニット２４は、また分岐予測ユニット１４への条件付分岐命令の実行に関して情報を送る。分岐予測が間違いであった場合、分岐予測ユニット１４は命令処理パイプラインに入った誤予測ブランチの後に命令を流し、そして命令キャッシュ１６あるいはメインメモリから要求される命令をフェッチさせる。留意すべきはそのような状況において、投機的に実施された、及び一時的にロード／ストアユニット２６およびリオーダーバッファ３２に記録されるものを含む、誤予測分岐命令の破棄後に発生するオリジナルプログラムシーケンスの命令の結果は破棄されるということである。さらに留意すべきはブランチ実行の結果はファンクショナルユニット２４によりリオーダーバッファ３２へ送られることであり、それはファンクショナルユニット２４への分岐誤予測を表す。 Each functional unit 24 also sends information regarding the execution of conditional branch instructions to the branch prediction unit 14. If the branch prediction is incorrect, the branch prediction unit 14 causes the instruction to flow after the mispredicted branch that has entered the instruction processing pipeline and fetches the requested instruction from the instruction cache 16 or main memory. It should be noted that in such circumstances, the original program that occurred after the discarding of the mispredicted branch instruction, including those that were speculatively implemented and temporarily recorded in the load / store unit 26 and the reorder buffer 32 This means that the result of the sequence instruction is discarded. It should be further noted that the result of the branch execution is sent by the functional unit 24 to the reorder buffer 32, which represents a mispredicted branch to the functional unit 24.

ファンクショナルユニット２４により生成される結果はレジスタ値がアップデートされる場合はリオーダーバッファ３２へ送られ、またメモリ位置のコンテントが変更される場合はロード／ストアユニット２６へ送られる。結果がレジスタに記録されるものである場合、リオーダーバッファ３２は命令がデコードされた際にレジスタの値に対してリザーブされる位置に結果を記録する。複数のリザルトバス３８は、ファンクショナルユニット２４及びロード／ストアユニット２６から結果をフォーワードするために含まれる。リザルトバス３８は実行される命令を認識するリオーダーバッファタグと同様、生成される結果を運ぶ。 The result generated by the functional unit 24 is sent to the reorder buffer 32 when the register value is updated, and to the load / store unit 26 when the content of the memory location is changed. If the result is to be recorded in a register, reorder buffer 32 records the result in a location that is reserved for the value of the register when the instruction is decoded. A plurality of result buses 38 are included to forward results from the functional unit 24 and the load / store unit 26. The result bus 38 carries the results generated as well as a reorder buffer tag that recognizes the instruction to be executed.

ロード／ストアユニット２６は、ファンクショナルユニット２４とデータキャッシュ２８との間にインターフェースを提供する。一実施例ではロード／ストアユニット２６は第１ロード／ストアバッファと第２ロードストア／バッファを有する。第１ロード／ストアバッファはデータキャッシュ２８にアクセスしていない保留のロードあるいはストアに対しデータ及びアドレス情報に対する記録場所を有して構成される。第２ロード／ストアバッファはデータキャッシュ２８にアクセスしたロード及びストアに対するデータ及びアドレス情報に対する記録場所を有して構成される。例えば第１バッファは１２の場所を有し、第２バッファは３２の場所を有する。デコードユニット２０はロード／ストアユニット２６へのアクセスのために調停する。第１バッファがフルの状態であるとき、デコードユニットは保留のロードあるいはストアの要求情報に対しロード／ストアユニット２６が空きを有するまで待たなければならない。ロード／ストアユニット２６はまた保留のストアメモリ演算に対してロードメモリ演算のためにディペンデンシーチェックを行い、データの一貫性が維持されていることを確認する。メモリ演算はプロセッサ１０とメインメモリサブシステムとの間のデータ送信である（送信はデータキャッシュ２８で達成されるが）。メモリ演算はメモリに記録されるオペランドを利用する命令の結果である。あるいはデータの送信を引き起こすがその他の演算は引き起こさないロード／ストア命令の結果である。 The load / store unit 26 provides an interface between the functional unit 24 and the data cache 28. In one embodiment, the load / store unit 26 has a first load / store buffer and a second load store / buffer. The first load / store buffer is configured to have a recording location for data and address information for pending loads or stores not accessing the data cache 28. The second load / store buffer has a recording location for data and address information for loads and stores that have accessed the data cache 28. For example, the first buffer has 12 locations and the second buffer has 32 locations. The decode unit 20 arbitrates for access to the load / store unit 26. When the first buffer is full, the decode unit must wait until the load / store unit 26 is free for pending load or store request information. The load / store unit 26 also performs a dependency check for the load memory operation on the pending store memory operation to confirm that the data consistency is maintained. Memory operations are data transmissions between the processor 10 and the main memory subsystem (although transmission is accomplished in the data cache 28). A memory operation is the result of an instruction that uses an operand recorded in memory. Or the result of a load / store instruction that causes the transmission of data but not other operations.

データキャッシュは、ロード／ストアユニット２６とメインメモリサブシステムとの間に送信されるデータを一時的に記録するために設けられる高速のキャッシュメモリである。一実施例では、データキャッシュ２８は２ウエイセットアソシエイティブ構造で６４キロバイトを上限とする記録容量を有する。データキャッシュ２８はセットアソシエイティブ構造、フルアソシエイティブ構造、ダイレクトマップ構造、及びその他の適切な構造の適切なサイズを含む様々な具体的なメモリ構造で実施される。 The data cache is a high-speed cache memory provided for temporarily recording data transmitted between the load / store unit 26 and the main memory subsystem. In one embodiment, the data cache 28 has a two-way set associative structure and a recording capacity up to 64 kilobytes. Data cache 28 may be implemented with a variety of specific memory structures including set associative structures, full associative structures, direct map structures, and other suitable structures in appropriate sizes.

バスインターフェースユニット３７はプロセッサ１０とバスを介するコンピュータシステムのその他のコンポーネントとの間を通信するように構成される。例えばバスはデジタルイクイップメントコーポレーション（ＤｉｇｉｔａｌＥｑｕｉｐｍｅｎｔＣｏｒｐｏｒａｔｉｏｎ）により開発されたＥＶ−6バスと互換性がある。その他、パケットベース、単方向あるいは双方向リンクなどを含むいずれの適切な相互接続構造を用いることが出来る。任意のＬ２キャッシュインターフェースもまたレベル２キャッシュへインターフェース接続のため、同様に用いられる。 The bus interface unit 37 is configured to communicate between the processor 10 and other components of the computer system via the bus. For example, the bus is compatible with the EV-6 bus developed by Digital Equipment Corporation. In addition, any suitable interconnection structure can be used including packet-based, unidirectional or bidirectional links. Any L2 cache interface is also used to interface to the level 2 cache as well.

留意すべきは、図１の実施例はスーパースカラ実行であるが、その他の実施例はスカラ実行を用いてもよいということである。更に、ファンクショナルユニットの数は実施例により異なる。その他の実施例は図１に示されている個々のリザベーションステーションではなくセントラルリザベーションステーションを用いてもよい。更にその他の実施例は図１のリザベーションステーション及びリオーダーバッファではなくセントラルスケジューラを用いる。 Note that although the embodiment of FIG. 1 is superscalar execution, other embodiments may use scalar execution. Furthermore, the number of functional units varies depending on the embodiment. Other embodiments may use a central reservation station rather than the individual reservation stations shown in FIG. Yet another embodiment uses a central scheduler rather than the reservation station and reorder buffer of FIG.

プロセッサ１０は、メールボックス１００としてここに示されるようにポートを含む。ポートは命令ストリームを実行する間、照会されるのを許容する。メールボックスポート１００はプロセッサ１０の外側にあるサービスプロセッサあるいはデバッグプロセッサと結合されるために構成される。プロセッサ１０の状態を照会するために用いられるテストデータはメールポート１００を介して受信される。同様に、照会から生じるプロセッサ１０の状態を示すステートデータはメールボックスポート１００からサービスプロセッサあるいはデバッグプロセッサへ送信される。メールボックスポート１００はデコードユニット２０Ａ−２０Ｃ、レジスタファイル３０、命令キャッシュ１６、ファンクショナルユニット２４Ａ−２４Ｃなど１つ以上のプロセッサ１０のユニットと結合される。情報はメールボックスポート１００から送信あるいは受信される。 The processor 10 includes a port as shown here as a mailbox 100. The port allows to be queried while executing the instruction stream. Mailbox port 100 is configured to be coupled with a service processor or debug processor outside processor 10. Test data used to query the state of the processor 10 is received via the mail port 100. Similarly, state data indicating the state of the processor 10 resulting from the inquiry is transmitted from the mailbox port 100 to the service processor or the debug processor. Mailbox port 100 is coupled to one or more units of processor 10, such as decode units 20A-20C, register file 30, instruction cache 16, functional units 24A-24C. Information is transmitted or received from the mailbox port 100.

図２に、プロセッサの状態を判断するシステムの一実施例のブロック図が示されている。この実施例は、プロセッサ１０、以後ホストプロセッサ１０はサービスプロセッサ１４０と結合される。いくつかの実施例では、サービスプロセッサ１４０はホストプロセッサ１０が実施されるコンピュータシステムに結合されるもうひとつのコンピュータシステムに配置される。その他の実施例では、サービスプロセッサ１４０はホストプロセッサが配置されるコンピュータシステムの周辺スロットに挿入されるボード上に配置される、あるいはホストプロセッサと同じボードに取り付けられる。 FIG. 2 shows a block diagram of one embodiment of a system for determining the state of a processor. In this embodiment, processor 10, and thereafter host processor 10, is coupled with service processor 140. In some embodiments, service processor 140 is located on another computer system that is coupled to the computer system on which host processor 10 is implemented. In other embodiments, service processor 140 is located on a board that is inserted into a peripheral slot of the computer system in which the host processor is located, or attached to the same board as the host processor.

先に述べたように、ホストプロセッサ１０は、メールボックスポート１００を含む。メールボックスポート１００はメールボックス入力ポート１０２及びメールボックス出力ポート１０４を含む。メールボックス入力ポート１０２はサービスプロセッサ１４０のコンプリメンタリ出力ポート１５２と結合するように構成される。同様にメールボックス出力ポート１０４はサービスプロセッサ１４０のコンプリメンタリ入力ポート１５４と結合するように構成される。 As previously mentioned, the host processor 10 includes a mailbox port 100. Mailbox port 100 includes a mailbox input port 102 and a mailbox output port 104. Mailbox input port 102 is configured to couple with complementary output port 152 of service processor 140. Similarly, mailbox output port 104 is configured to couple with complementary input port 154 of service processor 140.

ホストプロセッサ１０のメールボックス入力ポート１０２はサービスプロセッサ１４０から送信されるテストデータを受信する。テストデータはサービスプロセッサ１４０の出力ポート１５２からホストプロセッサ１０のメールボックス入力ポート１０２へ送信される。同様に、サービスプロセッサ１４０はホストプロセッサ１０からステートデータを受信するように構成される。ステートデータはホストプロセッサ１０のメールボックス出力ポート１０４からサービスプロセッサ１４０の入力ポート１５４へ送信される。入力／出力ポートの各コンプリメンタリペアは互いに独立して機能する。つまり出力ポート１５２はメールボックス出力ポート１０４から入力ポート１５４へのステートデータのいずれの送信からも独立して、メールボックス入力ポート１０２へテストデータを送信する。 The mailbox input port 102 of the host processor 10 receives test data transmitted from the service processor 140. The test data is transmitted from the output port 152 of the service processor 140 to the mailbox input port 102 of the host processor 10. Similarly, the service processor 140 is configured to receive state data from the host processor 10. The state data is transmitted from the mailbox output port 104 of the host processor 10 to the input port 154 of the service processor 140. Each complementary repair of input / output ports functions independently of each other. That is, the output port 152 transmits test data to the mailbox input port 102 independently of any transmission of state data from the mailbox output port 104 to the input port 154.

テストデータはサービスプロセッサ１４０の出力ポート１５２へ、テストアクセスポート（ＴＡＰ）の一部であるテストデータ入力（ＴＤＩ）ピンを介して、連続的にロードされる。テストアクセスポートはＩＥＥＥ規格１１４９．１と互換性のある境界走査（バウンダリスキャン）ポートである。同様にステートデータはテストデータ出力ピンを介して入力ポート１５４から連続的にシフトされる。サービスプロセッサ１４０からのテストデータのロード及びステートデータのアンロードについての追加の詳細は以下に解説される。 Test data is continuously loaded into the output port 152 of the service processor 140 via a test data input (TDI) pin that is part of the test access port (TAP). The test access port is a boundary scan port compatible with the IEEE standard 1149.1. Similarly, state data is continuously shifted from the input port 154 via the test data output pin. Additional details about loading test data and unloading state data from service processor 140 are described below.

図３はサービスプロセッサ出力ポートに結合されるホストプロセッサ入力ポートの一実施例を例示したブロック図である。実施例ではホストプロセッサ１０のメールボックス入力ポート１０２はサービスプロセッサ１４０のメールボックス出力ポート１５２と結合されている。メールボックス入力ポート１０２は入力レジスタ１１２を含む。入力レジスタ１１２はテストデータを受信及び記録するように構成されている。テストデータは出力ポート１５２に配置される出力レジスタ１６２から受信される。両出力レジスタ１６２及び入力レジスタ１１２は様々な大きさである。一実施例では、出力レジスタ１６２及び入力レジスタ１１２は３２ビットのデータと１つの有効ビットを記録するように構成される。多様なレジスタ（両メールボックス入力ポート及び両メールボックス出力ポートに対して）と同様、異なるレジスタサイズを有するその他の実施例も可能であり、検討される。 FIG. 3 is a block diagram illustrating one embodiment of a host processor input port coupled to a service processor output port. In the exemplary embodiment, mailbox input port 102 of host processor 10 is coupled to mailbox output port 152 of service processor 140. Mailbox input port 102 includes an input register 112. The input register 112 is configured to receive and record test data. Test data is received from an output register 162 located at output port 152. Both output registers 162 and input registers 112 are of various sizes. In one embodiment, output register 162 and input register 112 are configured to record 32 bits of data and one valid bit. Other embodiments with different register sizes as well as various registers (for both mailbox input ports and both mailbox output ports) are possible and are contemplated.

ホストプロセッサ１０の状態を判断するため、入力レジスタ１１２の有効ビットは出力レジスタ１６２へのテストデータのローディングに続いて、照会されなければならない。
先に述べたように、データはＩＥＥＥ１１４９．１規格と互換性のある境界走査ＴＡＰを介してメールボックス出力ポート１５２へロードされる。示されている一実施例では、出力レジスタ１６２はＴＤＩピンと結合される。テストデータは連続的にＴＤＩピンを介して出力レジスタ１６２にシフトされる。テストデータに加えて、有効ビットもまたＴＤＩピンを介して出力レジスタにセットされる。 To determine the state of the host processor 10, the valid bit in the input register 112 must be queried following the loading of test data into the output register 162.
As previously mentioned, the data is loaded into the mailbox output port 152 via a boundary scan TAP compatible with the IEEE 1149.1 standard. In the illustrated embodiment, output register 162 is coupled to the TDI pin. Test data is continuously shifted to the output register 162 via the TDI pin. In addition to the test data, a valid bit is also set in the output register via the TDI pin.

入力レジスタ１１２へテストデータを送信するため、有効ビットをポーリングしてクリアな状態であることを確認することがまず必要である。このビットはホストプロセッサのクロックドメインにあるので、まずサービスプロセッサのクロックドメインへ同期化させる必要がある。同期化されたバージョンの有効データは出力レジスタ１６２に取り込まれ、試験のために順次シフトアウトされる。有効データは試験のために順次シフトアウトされる間、テストデータは出力レジスタ１６２へ順次ロードされる。 In order to transmit test data to the input register 112, it is first necessary to poll the valid bit to confirm that it is in a clear state. Since this bit is in the host processor clock domain, it must first be synchronized to the service processor clock domain. The synchronized version of the valid data is captured in output register 162 and sequentially shifted out for testing. Test data is sequentially loaded into output register 162 while valid data is sequentially shifted out for testing.

入力レジスタ１１２の有効ビットをポーリングする間に有効ビットがセットされると判断される場合はテストデータがメールボックス入力ポート１０２にロードされたが、ホストプロセッサ１０により読み出されていないことを表している。従って、出力レジスタ１６２へテストデータをローディングし、及び続いて入力レジスタ１１２への送信はこの有効ビットがセットされる間は禁止される。テストデータの読み出し後、有効ビットは入力レジスタ１１２からホストプロセッサによりクリアな状態にされる。入力レジスタ１１２に記録される有効ビットはクリアな状態であるとの検出に応答して、出力ポート１５２は入力レジスタ１１２へテストデータの送信を開始する。この実施例では、別々のＴＡＰ命令が用いられ有効ビットがセットされる。この命令を用いることにより出力レジスタ１６２から入力レジスタ１１２へデータの同期ローディングをさせる（図３の“Ａ”を参照のこと）。 If it is determined that the valid bit is set while polling the valid bit of the input register 112, this indicates that test data has been loaded into the mailbox input port 102 but has not been read by the host processor 10. Yes. Therefore, loading test data into the output register 162 and subsequent transmission to the input register 112 is prohibited while this valid bit is set. After reading the test data, the valid bit is cleared from the input register 112 by the host processor. In response to detecting that the valid bit recorded in the input register 112 is in a clear state, the output port 152 starts transmitting test data to the input register 112. In this embodiment, a separate TAP instruction is used and the valid bit is set. By using this instruction, data is synchronously loaded from the output register 162 to the input register 112 (see “A” in FIG. 3).

ホストプロセッサ１０のプロセッサコアは入力レジスタ１１２の有効ビットのポーリングも行う。入力レジスタ１１２へのテストデータの正常ローディングを示す有効ビットのセットを検出後、プロセッサコアはテストデータを読み出す。次いでテストデータが用いられ、ステートデータを生成し、サービスプロセッサへと戻される。更なる詳細を以下に解説する。 The processor core of the host processor 10 also polls the valid bit of the input register 112. After detecting a set of valid bits indicating normal loading of test data into the input register 112, the processor core reads the test data. The test data is then used to generate state data and return to the service processor. Further details are described below.

入力ポート及び出力ポート間の送信及び受信用プロトコルは両ホストプロセッサ１０及びサービスプロセッサ１４０に対し同じである。以下の表１に、入力／出力ポートを組み合わせた一実施例に対するプロトコルを例示する。
表１
│トランスミッタ │レシーバ │
│ │ │
│１．クリアな状態になるまで│１．セットされるまでホストプロセ│
│ホストプロセッサの有効ビッ│ッサの有効ビットをポーリングす │
│トをポーリングする。 │る。 │
│ │ │
│２．レジスタにデータを保存│２．データを読み出す。 │
│する。 │ │
│ │ │
│３．ホストプロセッサの有効│３．ホストプロセッサの有効ビット│
│ビットをセットする。 │をクリアな状態にする。 │ The transmission and reception protocols between the input port and output port are the same for both host processors 10 and service processor 140. Table 1 below illustrates the protocol for one embodiment that combines input / output ports.
Table 1
│Transmitter │Receiver │
│ │ │
│1. Until clear state | Host process until set
│ Polling the valid bit of the host processor │ Polling the valid bit of the memory │
│Poll │ ru. │
│ │ │
│2. Save data in register | 2. Read data. │
| │ │
│ │ │
│3. Effectiveness of host processor | 3. Host processor valid bits
│Set the bit. Clear │. │

概して、システムのトランスミッタ（すなわち、出力ポート）はレシーバへデータを送信する前にホストプロセッサの有効ビットをポーリングする必要がある。ホストプロセッサの有効データがクリアな状態であるとの検出後、データは出力レジスタに保存され、レシーバの入力レジスタに送信される。トランスミッタからレシーバへのデータが正常送信されたことは、入力レジスタの有効ビットをセットすることにより表される。システムのレシーバ（すなわち入力ポート）は、ホストプロセッサがセットされるまでホストプロセッサの有効ビットをポーリングする。ホストプロセッサの有効ビットがセットされたことは、有効データが入力レジスタに存在することを示し、従って、レシーバがデータを読み出すのを許容する。データの読み出しに続き、ホストプロセッサの有効ビットはクリアな状態にされる。 In general, the transmitter (ie, output port) of the system needs to poll the host processor's valid bit before sending data to the receiver. After detecting that the valid data of the host processor is clear, the data is stored in the output register and transmitted to the input register of the receiver. The successful transmission of data from the transmitter to the receiver is indicated by setting the valid bit in the input register. The system's receiver (ie, input port) polls the host processor's valid bit until the host processor is set. Setting the host processor's valid bit indicates that valid data is present in the input register, thus allowing the receiver to read the data. Following the data read, the host processor valid bit is cleared.

様々な実施例では、サービスプロセッサのレシーバは入力レジスタからデータを定期的にシフトし、且つ、先述のシフト（有効ビットはシフトアウトされるデータの一部であるので）に続いて有効ビットを調べ、データが有効であるかどうかを判断する。 In various embodiments, the service processor receiver periodically shifts data from the input register and examines the valid bits following the preceding shift (since the valid bits are part of the data being shifted out). Determine if the data is valid.

図４に、サービスプロセッサの入力ポートに結合されるホストプロセッサの出力ポートの一実施例のブロック図を示す。示されている実施例では、ホストプロセッサ１０のメールボックス出力ポート１０４は、サービスプロセッサ１４０の入力ポート１５４と結合されている。メールボックス出力ポート１０４は出力レジスタ１１４を含み、出力レジスタ１１４はホストプロセッサ１０のプロセッサコアから受信するステートデータを記録するように構成される。出力レジスタ１１４に受信されるステートデータは先に入力されるテストデータに応答して生成される。メールボックス出力ポートは出力レジスタ１１４を含み、一方で入力ポート１５４は入力レジスタ１６４を含む。 FIG. 4 shows a block diagram of one embodiment of an output port of a host processor that is coupled to an input port of a service processor. In the illustrated embodiment, the mailbox output port 104 of the host processor 10 is coupled to the input port 154 of the service processor 140. Mailbox output port 104 includes an output register 114 that is configured to record state data received from the processor core of host processor 10. The state data received by the output register 114 is generated in response to the test data input first. Mailbox output port includes output register 114, while input port 154 includes input register 164.

出力レジスタ１１４にいずれのデータ送信を行うにしても、その前に、出力レジスタ１１４に記録される有効ビットをポーリングし、クリアであるかどうかを判断する必要がある。有効ビットがクリアな状態であるとの判断後、メールボックス出力ポート１０４は、出力レジスタ１１４にステートデータを保存し、続いて出力レジスタ１１４に有効ビットをセットする。入力レジスタ１６４へのデータ送信はサービスプロセッサ１４０により開始される。これはＴＡＰコントローラのキャプチャ・ＤＲ（Capture-DR）ステートにおいて行われる。ステートデータ及び有効ビットが入力レジスタ１６４に送信されると、有効ビットは調べられ、データはテストデータ出力（Test Data Out:ＴＤＯ）ピンを介してシフトアウトすることにより読み出される。有効ビットがセットされる場合、サービスプロセッサ１４０は有効ビットとともにシフトアウトされたデータは実行可能である（すなわち有効データ）ということを判断する。 Before any data transmission is performed to the output register 114, it is necessary to poll the valid bit recorded in the output register 114 to determine whether it is clear or not. After determining that the valid bit is in a clear state, the mailbox output port 104 stores the state data in the output register 114 and then sets the valid bit in the output register 114. Data transmission to the input register 164 is initiated by the service processor 140. This is performed in the capture-DR (Capture-DR) state of the TAP controller. When state data and valid bits are sent to the input register 164, the valid bits are examined and the data is read out by shifting out through the Test Data Out (TDO) pin. If the valid bit is set, the service processor 140 determines that the data shifted out with the valid bit is executable (ie, valid data).

メールボックス入力ポートへの、及び出力ポートへからのデータ送信は、両サービスプロセッサ及びホストプロセッサに対する一連の命令により制御される。一実施例においては、サービスプロセッサに対するＴＡＰ命令は、ＭＢＯＸＩＮ、ＭＢＯＸＩＮＳＥＴＶ、ＭＢＯＸＯＵＴ、及びＭＢＯＸＯＵＴＣＬＲＶである。ＭＢＯＸＩＮ命令はサービスプロセッサ１４０により用いられ、出力ポート１５２からメールボックス入力ポート１０２へのテストデータの送信が達成される。サービスプロセッサ１４０は、テストデータの送信に応答してＭＢＯＸＩＮＳＥＴＶ命令を実行する。従って、入力レジスタ１１２に有効ビットをセットし、有効データが存在することをホストプロセッサ１０に示す。ＭＢＯＸＯＵＴ命令はホストプロセッサ１０からサービスプロセッサ１４０へステートデータの送信を開始するために実行される。ＭＢＯＸＯＵＴＣＬＲＶ命令は出力レジスタ１１４に記録される有効ビットをクリアな状態にするために実行される。 Data transmission to and from the mailbox input port is controlled by a series of instructions to both service processors and the host processor. In one embodiment, the TAP instructions for the service processor are MBOXIN, MBOXINSETV, MBOXOUT, and MBOXOUTCLRV. The MBOXIN instruction is used by the service processor 140 to accomplish the transmission of test data from the output port 152 to the mailbox input port 102. The service processor 140 executes the MBOXINSETV instruction in response to the transmission of test data. Therefore, a valid bit is set in the input register 112 to indicate to the host processor 10 that valid data exists. The MBOXOUT instruction is executed to start transmission of state data from the host processor 10 to the service processor 140. The MBOXOUTCLRV instruction is executed to clear the valid bit recorded in the output register 114.

ホストプロセッサ１０により実行される命令はニーモニックＵＣ＿ＳＰＲＥＧ＿ＭＢＯＸ＿ＩＮ及びＵＣ＿ＳＰＲＥＧ＿ＭＢＯＸ＿ＯＵＴに対し、モデルスペシフィックレジスタ（ＭＲＳ）リード及びライト命令を含む。ＵＣ＿ＳＰＲＥＧ＿ＭＢＯＸ＿ＩＮのＭＳＲリードはメールボックス入力レジスタ１１２へアクセスを開始し、レジスタからテストデータを読み出す。ＵＣ＿ＳＰＲＥＧ＿ＭＢＯＸ＿ＩＮに対するＭＳＲライトは有効ビットをクリアな状態にするために用いられる。ＵＣ＿ＳＰＲＥＧ＿ＭＢＯＸ＿ＯＵＴにたいするＭＳＲライトはメールボックス出力レジスタ１１４へアクセスを開始し、ステートデータを保存あるいは有効ビットをセットする。ＵＣ＿ＳＰＲＥＧ＿ＭＢＯＸ＿ＯＵＴのＭＳＲリードは有効ビットがクリアな状態であることを検出するために用いられる。 Instructions executed by the host processor 10 include model specific register (MRS) read and write instructions for mnemonics UC_SPREG_MBOX_IN and UC_SPREG_MBOX_OUT. The MSR read of UC_SPREG_MBOX_IN starts accessing the mailbox input register 112 and reads test data from the register. The MSR write for UC_SPREG_MBOX_IN is used to clear the valid bit. The MSR write for UC_SPREG_MBOX_OUT initiates access to the mailbox output register 114 to save state data or set the valid bit. The MSR read of UC_SPREG_MBOX_OUT is used to detect that the valid bit is clear.

留意すべきは、様々な命令と関連付けられるニーモニックは、１つの特定の実施例に対する具体例であることである。入力及び出力ポートとともに用いられる特定の命令を解説するその他のニーモニックを有する実施例は可能であり、検討される。 It should be noted that the mnemonics associated with the various instructions are examples for one particular embodiment. Embodiments with other mnemonics describing specific instructions used with input and output ports are possible and will be discussed.

留意すべきは、サービスプロセッサ１４０の出力レジスタ１６２及び入力レジスタ１６４はＴＤＩピンとＴＤＯピンのとの間のＴＡＰのいずれのシフトレジスタであってよく、いずれのシフトレジスタもテスト／ステートデータ及び少なくとも１つの有効ビットを記録するのに十分な数のビット位置を有するということである。更に留意すべきは、プロセッサ１４０のその他の実施例もまた可能であり、検討されるということであり、テストデータがロードされ、上述のＴＡＰの他に機構を介してステートデータが読み出される。加えてこの解説の目的のため、サービスプロセッサ及びデバッグプロセッサという用語は互換性があり、デバッグプロセッサはサービスプロセッサに代用でき、本明細書の解説に従い構成される。 It should be noted that the output register 162 and the input register 164 of the service processor 140 may be any TAP shift register between the TDI and TDO pins, and any shift register may contain test / state data and at least one at least one register. It has a sufficient number of bit positions to record valid bits. It should further be noted that other embodiments of the processor 140 are also possible and will be considered, in which test data is loaded and state data is read out via mechanisms in addition to the TAP described above. In addition, for the purposes of this discussion, the terms service processor and debug processor are interchangeable, and the debug processor can be substituted for the service processor and is configured in accordance with the discussion herein.

図５Ａに、ホストプロセッサを照会する方法の一実施例のフロー図を示す。方法５００はホストプロセッサ（アイテム５０２）の入力レジスタに有効ビットをポーリングすることから開始する。ポーリングの間、有効ビットがクリアな状態であるかあるいはセットされるかどうかについて判断される（アイテム５０４）。有効ビットがセットされている場合、ポーリングは継続する。セットされる有効ビットはホストプロセッサの入力レジスタがまだ読み出されていない有効データを含むことを表す。テストデータのシリアルローディングは上述のようにＴＤＩピンを介してなされる。テストデータが並列にローディングされるという他の実施例も可能であり、検討される。出力レジスタへのテストデータのローディングに続き、サービスプロセッサの出力ポートはホストプロセッサの入力レジスタにデータを送信し（アイテム５０８）また有効ビットをセットする（アイテム５１０）ことができる。ホストプロセッサの入力レジスタに有効ビットをセットすることは有効データが存在し、読み出しの準備ができていることをプロセッサに示す。有効ビットのセットの検出に応答して、ホストプロセッサはテストデータを読み出し、テストデータに基づきステートデータを生成する（アイテム５１２）。メールボックス入力ポートの入力レジスタに記録される有効ビットは、ホストプロセッサによるテストデータの読み出しに続いてホストプロセッサによりクリアな状態にし得る。 FIG. 5A shows a flow diagram of one embodiment of a method for querying a host processor. Method 500 begins by polling the input register of the host processor (item 502) for valid bits. During polling, a determination is made as to whether the valid bit is clear or set (item 504). If the valid bit is set, polling continues. A valid bit that is set indicates that the host processor input register contains valid data that has not yet been read. The serial loading of test data is done via the TDI pin as described above. Other embodiments in which test data is loaded in parallel are possible and are contemplated. Following loading of test data into the output register, the service processor output port can send data to the host processor input register (item 508) and set the valid bit (item 510). Setting a valid bit in the input register of the host processor indicates to the processor that valid data exists and is ready for reading. In response to detecting the set of valid bits, the host processor reads the test data and generates state data based on the test data (item 512). The valid bit recorded in the input register of the mailbox input port can be cleared by the host processor following reading of the test data by the host processor.

図５Ｂは、ホストプロセッサからステートデータを出力するための方法の一実施例を示したフロー図である。ステートデータは図５Ａで解説された方法及び図１−４において解説された機構により生成される。方法５５０はホストプロセッサの出力レジスタの有効ビットをポーリングすることから開始する（アイテム５５２）。図５Ａを参照して解説された方法と同様に、有効ビットのポーリングは有効ビットがクリアな状態であると判断されるまで継続される（アイテム５５４）。有効ビットはクリアな状態であると判断されると、ホストプロセッサはステートデータをそのメールボックス出力ポートの出力レジスタに保存する（５５６）。出力レジスタへのステートデータのローディングに続いて、ホストプロセッサは出力レジスタに有効ビットをセットする（アイテム５６０）。サービスプロセッサはCapture-DRステートの有効ビット及びデータを送信し、次いでそれを入力レジスタからシフトする（アイテム５６２）。送信される有効ビットがセットされる場合、次いでデータは有効であるものとして識別される。入力レジスタからステートデータの読み出しに応答して、ホストプロセッサの有効ビットはクリアすることができ、その結果、追加のステートデータをホストプロセッサから送ることが可能となる。特に、図５Ｂを参照して解説される演算は、図５Ａを参照して解説された演算とは独立して発生可能であることが重要である。 FIG. 5B is a flow diagram illustrating one embodiment of a method for outputting state data from a host processor. The state data is generated by the method described in FIG. 5A and the mechanism described in FIGS. 1-4. Method 550 begins by polling the valid bit of the host processor's output register (item 552). Similar to the method described with reference to FIG. 5A, polling of valid bits continues until it is determined that the valid bits are clear (item 554). If it is determined that the valid bit is clear, the host processor stores the state data in the output register of the mailbox output port (556). Following loading of the state data into the output register, the host processor sets a valid bit in the output register (item 560). The service processor sends the valid bit and data in the Capture-DR state and then shifts it from the input register (item 562). If the valid bit to be transmitted is set, then the data is identified as valid. In response to reading the state data from the input register, the host processor valid bit can be cleared so that additional state data can be sent from the host processor. In particular, it is important that the operations described with reference to FIG. 5B can occur independently of the operations described with reference to FIG. 5A.

留意すべきは本明細書にて用いられている“テストデータ”という用語は所望の応答を受信するためにホストプロセッサ１０に入力されるいずれの種類のデータを参照するということである。同様に、“ステートデータ”という用語はホストプロセッサ１０のオペレーションに関する情報を提供するのに用いられるいずれの種類のデータを参照する。加えて更に留意すべきは、テストデータのローディング及び／あるいはステートデータの読み出しはプロセッサの個々の照会に応答するもの、あるいは定期的に実施される演算のうちのどちらか一方である。代替の実施例は可能であり、検討され、有効ビットをポーリングし、データを送信する特定の順序はここに解説されたものと異なってもよい。 It should be noted that the term “test data” as used herein refers to any type of data that is input to the host processor 10 to receive the desired response. Similarly, the term “state data” refers to any type of data used to provide information regarding the operation of the host processor 10. In addition, it should be noted that the loading of test data and / or reading of state data is either in response to individual queries of the processor or operations performed periodically. Alternative embodiments are possible, discussed, and the specific order of polling valid bits and transmitting data may differ from that described herein.

本発明は特定の実施例に関して解説されているが、実施例は例示的なものであり、発明の範囲は限定されないことが理解されるであろう。実施例のいずれの変形、修正、追加及び改良は可能である。これらの変形、修正、追加、改良は以下のクレームで解説されるように発明の範囲内に収まるものである。 Although the invention has been described with reference to particular embodiments, it will be understood that the embodiments are illustrative and that the scope of the invention is not limited. Any variations, modifications, additions and improvements of the embodiments are possible. These variations, modifications, additions and improvements fall within the scope of the invention as set forth in the following claims.

本発明は、該してプロセッサに応用できる。 The present invention can be applied to a processor.

プロセッサの一実施例のブロック図。1 is a block diagram of one embodiment of a processor. FIG. プロセッサの状態を判断するためのシステムの一実施例のブロック図。1 is a block diagram of one embodiment of a system for determining the state of a processor. FIG. サービスプロセッサの出力ポートに結合されるホストプロセッサの入力ポートの一実施例を例示したブロック図。FIG. 3 is a block diagram illustrating one embodiment of a host processor input port coupled to a service processor output port. サービスプロセッサの入力ポートに結合されるホストプロセッサの出力ポートの一実施例のブロック図。FIG. 4 is a block diagram of one embodiment of a host processor output port coupled to a service processor input port. ホストプロセッサを照会する方法の一実施例のフロー図。FIG. 4 is a flow diagram of an embodiment of a method for querying a host processor. ホストプロセッサからステートデータを出力するための方法の一実施例のフロー図。FIG. 3 is a flow diagram of one embodiment of a method for outputting state data from a host processor.

Claims

A method for determining the state of a host processor, comprising:
Polling a first input register located in the host processor for a first valid bit until the first valid bit is clear (502);
Loading test data into a first output register located in the service processor (506);
Transmitting the test data from the first output register to the first input register (508);
Set the first valid bit (510) after the transmission is completed; and
Reading the test data from the first input register in response to detecting the first set of valid bits (512);
The load, the poll, the send, the set, and the read are those that do not interrupt an instruction stream executed by the host processor.

Determining the state of the host processor based on the test data; and
The method of claim 1, wherein state data is output to the service processor, the state data indicating a state of a host processor, and the determination and the output are performed asynchronously with respect to a host processor clock.

The output is
Polling the second output register for a second valid bit until the second valid bit is clear (552);
Loading state data into the second output register (556);
Sending the state data from the second output register to a second input register located in the service processor (558); and
The method of claim 2, wherein the state data is output (562) from the second input register.

4. The method of claim 3, wherein the first valid bit is cleared in response to reading the test data.

4. The method of claim 3, wherein the second valid bit is set following loading of the second input register.

A host processor (10) and a service processor (140), the host processor comprising a first input register (102), the service processor comprising a first output register (152), and the host processor comprising a second input register (152); The output register (104), the service processor comprises a second input register (154),
The service processor is configured to determine the state of the host processor;
In the judgment,
Polling the first valid bit of the first input register until the first valid bit is cleared;
Loading test data into the first output register;
Transmitting the test data from the first output register to the first input register;
The first valid bit is set after the transmission is completed,
The host processor is configured to read the test data from the first input register in response to the first valid bit; and
The load, the poll, the transmission, the set, and the read do not interrupt an instruction stream executed by the host processor.

The host processor further includes:
Determining the state of the host processor based on the test data; and
The system is configured to output state data to the service processor, wherein the state data indicates a state of a host processor, and the determination and output are executed asynchronously with respect to the host processor clock. 6. The system according to 6.

The host processor further includes:
Polling the second valid bit of the second output register until the second valid bit is clear;
Storing state data in the second output register;
The state processor is configured to transmit state data from the second output register to the second input register, and the service processor is configured to output the state data received by the second input register. Item 8. The system according to Item 7.

The system of claim 8, wherein the host processor is configured to clear the first valid bit in response to reading the test data.

The system of claim 8, wherein the host processor is configured to set the second valid bit in response to storing state data in the second output register.