JP2020198564A

JP2020198564A - Image forming system, image forming apparatus, control program for image forming apparatus, and control method for image forming apparatus

Info

Publication number: JP2020198564A
Application number: JP2019104257A
Authority: JP
Inventors: 和弥 ▲高▼橋; Kazuya Takahashi
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2020-12-10
Anticipated expiration: 2039-06-04
Also published as: JP7379872B2

Abstract

To prevent a user from feeling discomfort.SOLUTION: An image forming system comprises a smart speaker, a first information processing apparatus, and an MFP 140. The smart speaker receives input of a first voice, upon input of the first voice, outputs a second voice, and transmits the first voice. Upon receiving the first voice from the smart speaker, the first information processing apparatus transmits language information that can specify the language of the second voice. The MFP 140 includes: an acquisition unit 1402 that acquires the language of the second voice specified from the language information transmitted from the first information processing apparatus; a setting unit 1408 that sets the language; a switching unit 1406 that switches the language set by the setting unit 1408 to the language of the second voice acquired by the acquisition unit 1402; and a speaker 35 that outputs a third voice in the language switched by the switching unit 1406.SELECTED DRAWING: Figure 5

Description

本開示は、画像形成システム、画像形成装置、画像形成装置の制御プログラム、および画像形成装置の制御方法に関する。 The present disclosure relates to an image forming system, an image forming apparatus, a control program of the image forming apparatus, and a control method of the image forming apparatus.

近年、１の言語のみならず、複数の言語に対応した画像形成装置が提案されている。特許文献１では、ユーザーが指示したジョブの処理状態を音声出力させる場合に、ユーザーが画像形成装置の言語設定の切替操作をすることなく、当該ユーザーに対応する言語でジョブの処理状態を音声出力させる画像形成装置が提案されている。 In recent years, an image forming apparatus corresponding to not only one language but also a plurality of languages has been proposed. In Patent Document 1, when the processing state of a job instructed by the user is output by voice, the processing state of the job is output by voice in the language corresponding to the user without the user switching the language setting of the image forming apparatus. An image forming apparatus has been proposed.

特開２０１１−５９９５８号公報Japanese Unexamined Patent Publication No. 2011-59958

ところで、ユーザーの利便性を向上させるために、画像形成装置とスマートスピーカーとをネットワークで接続させる技術が考えられる。この技術では、ユーザーは、このスマートスピーカーに対して入力音声を発することにより、画像形成装置にジョブ（コピーなど）を入力する。スマートスピーカーに、入力音声が入力されると、スマートスピーカーは該入力音声に対する応答音声を、例えば、該入力音声に対応する言語で出力する。例えば、ユーザーが、日本語による「コピー１枚」という入力音声を入力した場合には、スマートスピーカーは、日本語による「分かりました」という応答音声を出力する。また、ユーザーが、英語による「Copy one」という入力音声を入力した場合には、スマートスピーカーは、英語による「OK」という応答音声を出力する。 By the way, in order to improve the convenience of the user, a technique of connecting the image forming device and the smart speaker via a network can be considered. In this technology, the user inputs a job (copy, etc.) to the image forming apparatus by emitting an input voice to the smart speaker. When the input voice is input to the smart speaker, the smart speaker outputs the response voice to the input voice in, for example, the language corresponding to the input voice. For example, when the user inputs the input voice of "one copy" in Japanese, the smart speaker outputs the response voice of "I understand" in Japanese. In addition, when the user inputs the input voice of "Copy one" in English, the smart speaker outputs the response voice of "OK" in English.

また、例えば、ユーザーが入力音声により入力したジョブが完了した旨を示す完了音声を、スマートスピーカーから出力させることが考えられる。しかしながら、画像形成装置とスマートスピーカーとの接続を長時間継続させることは、スマートスピーカーに対してノイズ音声が入力されてしまうなどの理由により、好ましくない。したがって、画像形成装置とスマートスピーカーとの接続時間を短時間とすることが好ましい。ところが、画像形成装置とスマートスピーカーとの接続時間を短時間とすると、画像形成装置は、ジョブが完了したことを示す信号をスマートスピーカーに送信することができなくなり、結果として、スマートスピーカーから完了音声を出力させることができない。 Further, for example, it is conceivable to output a completion voice indicating that the job input by the user by the input voice is completed from the smart speaker. However, it is not preferable to continue the connection between the image forming apparatus and the smart speaker for a long time because noise sound is input to the smart speaker. Therefore, it is preferable that the connection time between the image forming apparatus and the smart speaker is short. However, if the connection time between the image forming device and the smart speaker is short, the image forming device cannot send a signal indicating that the job is completed to the smart speaker, and as a result, the completed sound is transmitted from the smart speaker. Cannot be output.

そこで、画像形成装置に完了音声を出力させることが考えられる。しかしながら、従来、画像形成装置に対してスマートスピーカーを接続させる事項は知られていない。仮に、画像形成装置に対してスマートスピーカーを接続させるという事項に想到したとしても、スマートスピーカーからの応答音声の言語と、画像形成装置からの完了音声の言語との関係が鑑みられていない。したがって、スマートスピーカーからの応答音声の言語と、画像形成装置からの完了音声の言語とが異なる場合には、ユーザーに違和感を抱かせるという問題があった。 Therefore, it is conceivable to have the image forming apparatus output the completed voice. However, conventionally, there is no known matter of connecting a smart speaker to an image forming apparatus. Even if the matter of connecting the smart speaker to the image forming apparatus is conceived, the relationship between the language of the response voice from the smart speaker and the language of the completed voice from the image forming apparatus is not considered. Therefore, when the language of the response voice from the smart speaker and the language of the completed voice from the image forming apparatus are different, there is a problem that the user feels uncomfortable.

本開示は、係る実情に鑑み考え出されたものであり、ユーザーに違和感を感じさせることを防止する画像形成システム、画像形成装置、画像形成プログラム、および画像形成方法に関する。 The present disclosure has been devised in view of such circumstances, and relates to an image forming system, an image forming apparatus, an image forming program, and an image forming method that prevent the user from feeling uncomfortable.

本開示のある局面に従うと、音声入出力装置と、情報処理装置と、画像形成装置とを備える画像形成システムであって、音声入出力装置は、第１音声が入力されるとともに、該第１音声が入力されたときに第２音声を出力し、第１音声を送信し、情報処理装置は、音声入出力装置からの第１音声を受信したときに、第２音声の言語を特定可能な言語情報を送信し、画像形成装置は、情報処理装置から送信された言語情報から特定される第２音声の言語を取得する言語取得部と、言語を設定する設定部と、設定部により設定されている言語を、言語取得部により取得された第２音声の言語に切換える切換部と、切換部により切換えられた言語での第３音声を出力する出力部とを備える、画像形成システムが提供される。 According to a certain aspect of the present disclosure, an image forming system including a voice input / output device, an information processing device, and an image forming device, wherein the first voice is input and the first voice is input to the voice input / output device. When the voice is input, the second voice is output and the first voice is transmitted, and the information processing device can specify the language of the second voice when the first voice is received from the voice input / output device. The image forming device is set by a language acquisition unit that transmits language information and acquires the language of the second voice specified from the language information transmitted from the information processing device, a setting unit that sets the language, and a setting unit. An image forming system is provided that includes a switching unit that switches the language to the language of the second sound acquired by the language acquisition unit, and an output unit that outputs the third sound in the language switched by the switching unit. To.

ある局面において、情報処理装置は、音声入出力装置から送信された第１音声の言語を検出し、検出された第１音声の言語を、言語情報として送信する、第２音声の言語として取得する。 In a certain aspect, the information processing device detects the language of the first voice transmitted from the voice input / output device, and acquires the detected language of the first voice as the language of the second voice, which is transmitted as language information. ..

ある局面において、情報処理装置は、音声入出力装置に設定されている第２音声の言語を、言語情報として送信する。 In a certain aspect, the information processing device transmits the language of the second voice set in the voice input / output device as language information.

ある局面において、画像形成システムは、音声データを提供する第１提供装置をさらに備え、画像形成装置は、切換部により切換えられる言語に対応する音声データを記憶していない場合には、切換部により切換えられる言語に対応する音声データを第１提供装置から取得する音声データ取得部をさらに備え、出力部は、音声データ取得部により取得された音声データに基づいた第３音声を出力する。 In a certain aspect, the image forming system further includes a first providing device that provides audio data, and if the image forming apparatus does not store the audio data corresponding to the language switched by the switching unit, the switching unit may be used. It further includes a voice data acquisition unit that acquires voice data corresponding to the language to be switched from the first providing device, and the output unit outputs a third voice based on the voice data acquired by the voice data acquisition unit.

ある局面において、出力部は、第１提供装置からの音声データの取得前に第３音声を出力する場合には、切換部により切換えられた言語での第３音声ではなく、切換部により切換えられる前の言語での第３音声を出力する。 In a certain aspect, when the output unit outputs the third sound before the acquisition of the sound data from the first providing device, the output unit is switched by the switching unit instead of the third sound in the language switched by the switching unit. Output the third voice in the previous language.

ある局面において、画像形成装置は、画像を表示する表示部と、表示部を制御する表示制御部とをさらに備え、表示制御部は、切換部により切換えられた言語での画像を表示するように表示部を制御する。 In a certain aspect, the image forming apparatus further includes a display unit for displaying an image and a display control unit for controlling the display unit, and the display control unit displays an image in a language switched by the switching unit. Control the display.

ある局面において、画像形成システムは、画像データを提供する第２提供装置をさらに備え、画像形成装置は、切換部により切換えられる言語に対応する画像データを記憶していない場合には、切換部により切換えられる言語に対応する画像データを第２提供装置から取得する画像データ取得部をさらに備え、表示制御部は、画像データ取得部により取得された画像データに基づいた画像を表示部に表示させる。 In a certain aspect, the image forming system further includes a second providing device that provides image data, and if the image forming device does not store the image data corresponding to the language switched by the switching unit, the switching unit may be used. An image data acquisition unit that acquires image data corresponding to the language to be switched from the second providing device is further provided, and the display control unit causes the display unit to display an image based on the image data acquired by the image data acquisition unit.

ある局面において、表示制御部は、情報処理装置から送信された第１音声に基づいたジョブが完了した場合に、切換部により切換られる前の言語での画像を表示するように表示部を制御する。 In a certain aspect, the display control unit controls the display unit so as to display an image in the language before being switched by the switching unit when the job based on the first voice transmitted from the information processing device is completed. ..

ある局面において、表示制御部は、音声入出力装置と画像形成装置との接続が切れたことを画像形成装置が判断した場合に、切換部により切換られる前の言語での画像を表示するように表示部を制御する。 In a certain aspect, when the image forming apparatus determines that the connection between the audio input / output device and the image forming apparatus has been disconnected, the display control unit displays an image in the language before being switched by the switching unit. Control the display.

ある局面において、画像形成装置は、切換部により切換えられる言語での画像を表示する制御を禁止する禁止部をさらに備える。 In certain aspects, the image forming apparatus further comprises a prohibition unit that prohibits control of displaying an image in a language switched by the switching unit.

ある局面において、出力部は、情報処理装置から送信された第１音声に基づいたジョブが完了した場合に、切換部により切換られる前の言語での音声を出力可能となる。 In a certain aspect, when the job based on the first voice transmitted from the information processing device is completed, the output unit can output the voice in the language before being switched by the switching unit.

ある局面において、出力部は、エラーが発生した場合に、エラーが発生したことを示す音声およびエラーの内容を示す音声を、切換部により切換られる前の言語で出力する。 In a certain aspect, when an error occurs, the output unit outputs a voice indicating that the error has occurred and a voice indicating the content of the error in the language before being switched by the switching unit.

ある局面において、出力部は、エラーが発生した場合に、エラーが発生したことを示す音声を切換部により切換えられた言語で出力し、エラーの内容を示す音声を切換部により切換られる前の言語で出力する。 In a certain aspect, when an error occurs, the output unit outputs a voice indicating that the error has occurred in the language switched by the switching unit, and outputs a voice indicating the content of the error in the language before being switched by the switching unit. Output with.

ある局面において、出力部は、音声入出力装置から送信された音声に基づいたジョブが完了したことを示す音声を、第３音声として出力する。 In a certain aspect, the output unit outputs a voice indicating that the job based on the voice transmitted from the voice input / output device is completed as a third voice.

本開示の他の局面に従うと、音声入出力装置にユーザーからの第１音声が入力されたときに該音声入出力装置から出力される第２音声の言語を特定可能な言語情報を取得する言語取得部と、言語を設定する設定部と、設定部により設定されている言語を、言語取得部により取得された第２音声の言語に切換える切換部と、切換部により切換えられた言語での第３音声を出力する出力部とを備える、画像形成装置が提供される。 According to another aspect of the present disclosure, a language for acquiring language information capable of identifying the language of the second voice output from the voice input / output device when the first voice from the user is input to the voice input / output device. An acquisition unit, a setting unit for setting a language, a switching unit for switching the language set by the setting unit to the language of the second voice acquired by the language acquisition unit, and a switching unit for the language switched by the switching unit. An image forming apparatus is provided that includes an output unit that outputs three sounds.

ある局面において、切換部により切換えられる言語に対応する音声データを記憶していない場合には、切換部により切換えられる言語に対応する音声データを第１提供装置から取得する音声データ取得部をさらに備え、出力部は、音声データ取得部により取得された音声データに基づいた第３音声を出力する。 In a certain aspect, when the voice data corresponding to the language switched by the switching unit is not stored, the voice data acquisition unit for acquiring the voice data corresponding to the language switched by the switching unit from the first providing device is further provided. , The output unit outputs a third voice based on the voice data acquired by the voice data acquisition unit.

ある局面において、画像を表示する表示部と、表示部を制御する表示制御部とをさらに備え、表示制御部は、切換部により切換えられた言語での画像を表示するように表示部を制御する。 In a certain aspect, a display unit for displaying an image and a display control unit for controlling the display unit are further provided, and the display control unit controls the display unit so as to display an image in the language switched by the switching unit. ..

ある局面において、画像形成装置は、切換部により切換えられる言語に対応する画像データを記憶していない場合には、切換部により切換えられる言語に対応する画像データを第２提供装置から取得する画像データ取得部をさらに備え、表示制御部は、画像データ取得部により取得された画像データに基づいた画像を表示部に表示させる。 In a certain aspect, when the image forming apparatus does not store the image data corresponding to the language switched by the switching unit, the image data corresponding to the language switched by the switching unit is acquired from the second providing device. The display control unit further includes an acquisition unit, and causes the display unit to display an image based on the image data acquired by the image data acquisition unit.

ある局面において、表示制御部は、第１音声に基づいたジョブが完了した場合に、切換部により切換られる前の言語での画像を表示するように表示部を制御する。 In a certain aspect, the display control unit controls the display unit to display an image in the language before being switched by the switching unit when the job based on the first voice is completed.

ある局面において、切換部により切換えられる言語での画像を表示する制御を禁止する禁止部をさらに備える。 In a certain aspect, a prohibition unit for prohibiting control of displaying an image in a language switched by the switching unit is further provided.

ある局面において、出力部は、第１音声に基づいたジョブが完了した場合に、切換部により切換られる前の言語での音声を出力可能となる。 In a certain aspect, when the job based on the first voice is completed, the output unit can output the voice in the language before being switched by the switching unit.

本開示の他の局面に従うと、コンピューターに、音声入出力装置にユーザーからの第１音声が入力されたときに該音声入出力装置から出力される第２音声の言語を特定可能な言語情報を取得するステップと、設定されている言語を、第２音声の言語に切換えるステップと、切換えられた言語での第３音声を出力するステップとを実行させる、画像形成装置の制御プログラムが提供される。 According to another aspect of the present disclosure, the computer is provided with linguistic information capable of identifying the language of the second voice output from the voice input / output device when the first voice from the user is input to the voice input / output device. A control program for an image forming apparatus is provided that executes a step of acquiring, a step of switching a set language to a language of a second voice, and a step of outputting a third voice in the switched language. ..

本開示の他の局面に従うと、音声入出力装置にユーザーからの第１音声が入力されたときに該音声入出力装置から出力される第２音声の言語を特定可能な言語情報を取得するステップと、設定されている言語を、第２音声の言語に切換えるステップと、切換えられた言語での第３音声を出力するステップとを備える、画像形成装置の制御方法が提供される。 According to another aspect of the present disclosure, when the first voice from the user is input to the voice input / output device, the step of acquiring the language information that can identify the language of the second voice output from the voice input / output device. A method for controlling the image forming apparatus is provided, which includes a step of switching the set language to the language of the second voice and a step of outputting the third voice in the switched language.

本開示によれば、ユーザーに違和感を抱かせることを防止できる。 According to the present disclosure, it is possible to prevent the user from feeling uncomfortable.

本実施形態の適用例を説明するための図である。It is a figure for demonstrating the application example of this embodiment. 本実施形態の画像形成システムの構成例を説明するための図である。It is a figure for demonstrating the configuration example of the image formation system of this embodiment. 本実施形態のＭＦＰのハードウェア構成を示す図である。It is a figure which shows the hardware configuration of the MFP of this embodiment. 本実施形態の情報処理装置の機能構成例を示す図である。It is a figure which shows the functional structure example of the information processing apparatus of this embodiment. 本実施形態のＭＦＰの機能構成例を示す図である。It is a figure which shows the functional structure example of the MFP of this embodiment. 本実施形態のＭＦＰのディスプレイに表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on the display of the MFP of this embodiment. 音声データベースの一例を示す図である。It is a figure which shows an example of a voice database. 画像データベースの一例を示す図である。It is a figure which shows an example of an image database. 本実施形態の画像形成システムのフローチャートである。It is a flowchart of the image formation system of this embodiment. 第１処理のフローチャートである。It is a flowchart of 1st process. 第２処理のフローチャートである。It is a flowchart of a second process. 第３処理のフローチャートである。It is a flowchart of the 3rd process.

本発明に基づいた実施の形態における画像形成装置について、以下、図を参照しながら説明する。以下に説明する実施の形態において、個数、量などに言及する場合、特に記載がある場合を除き、本発明の範囲は必ずしもその個数、量などに限定されない。同一の部品、相当部品に対しては、同一の参照番号を付し、重複する説明は繰り返さない場合がある。また、各実施の形態における構成を適宜組み合わせて用いることは当初から予定されていることである。 The image forming apparatus according to the embodiment based on the present invention will be described below with reference to the drawings. In the embodiments described below, when the number, quantity, etc. are referred to, the scope of the present invention is not necessarily limited to the number, quantity, etc., unless otherwise specified. The same parts and equivalent parts may be given the same reference number and duplicate explanations may not be repeated. Further, it is planned from the beginning to use the configurations in each embodiment in appropriate combinations.

＜本実施形態＞
［適用例］
まず、本実施形態の適用例を説明する。図１は、本実施形態の適用例を説明するための図である。本実施形態の画像形成システムは、スマートスピーカー２０と、ＭＦＰ１４０（MultiFunction Printer）とを含む。例えば、スマートスピーカー２０に入力音声が入力されると、後述する情報処理装置等が、スマートスピーカー２０と、ＭＦＰ１４０とのセッションを形成する。スマートスピーカー２０は、入力音声が入力されるとともに、入力音声の言語での音声を該入力音声に応じた応答音声を出力する。ＭＦＰ１４０は、複数の言語（例えば、日本語、および英語等）のうち、設定された言語での音声を出力可能である。ユーザーは、出力させる音声を選択操作により自由に設定できる。また、ＭＦＰ１４０ではデフォルトの設定言語として、日本語が設定されている。また、ＭＦＰ１４０は、入力音声に基づくジョブ（例えば、画像形成ジョブ等）を実行する。 <This Embodiment>
[Application example]
First, an application example of this embodiment will be described. FIG. 1 is a diagram for explaining an application example of the present embodiment. The image forming system of this embodiment includes a smart speaker 20 and an MFP 140 (MultiFunction Printer). For example, when the input voice is input to the smart speaker 20, an information processing device or the like described later forms a session between the smart speaker 20 and the MFP 140. The smart speaker 20 inputs the input voice and outputs the voice in the language of the input voice as the response voice corresponding to the input voice. The MFP 140 can output audio in a set language among a plurality of languages (for example, Japanese and English). The user can freely set the sound to be output by the selection operation. Further, in the MFP 140, Japanese is set as the default setting language. Further, the MFP 140 executes a job based on the input voice (for example, an image forming job or the like).

図１（Ａ）に示すように、ユーザーＵが、英語で、「Please make two color copies」という入力音声を発したとする。その後、図１（Ｂ）に示すように、スマートスピーカー２０は、入力音声と同一の言語である英語で「OK」という応答音声を出力する。これにより、ユーザーは、スマートスピーカー２０が入力音声の入力を受け付けたことを認識できる。 As shown in FIG. 1 (A), it is assumed that the user U emits the input voice "Please make two color copies" in English. After that, as shown in FIG. 1 (B), the smart speaker 20 outputs a response voice of "OK" in English, which is the same language as the input voice. As a result, the user can recognize that the smart speaker 20 has received the input of the input voice.

ＭＦＰ１４０は、入力音声に基づくジョブとして、カラーコピー２枚を作成するジョブを実行する。その後、ＭＦＰ１４０は、該ジョブが完了（終了）したときに、完了音声を出力する。 The MFP 140 executes a job of creating two color copies as a job based on the input voice. After that, the MFP 140 outputs a completion voice when the job is completed (finished).

また、比較例の画像形成システムは、ユーザーの利便性を向上させるために、画像形成装置とスマートスピーカーとをネットワークで接続させることにより構成されたとする。ここで、比較例の画像形成システムでは、スマートスピーカー２０の応答音声の言語と、ＭＦＰ１４０の完了音声の言語との関係が鑑みられていなかった。したがって、図１（Ｃ）に示すように、ＭＦＰ１４０Ｘは、デフォルトの設定言語である日本語で、完了音声を出力する。図１（Ｃ）の例では、完了音声は、日本語での「コピーが終了しました」という音声である。比較例の画像形成システムでは、図１（Ｃ）の※印に示すように、応答音声の言語と、完了音声の言語とが異なることから、言語に関する違和感をユーザーに感じさせるという問題が生じる。 Further, it is assumed that the image forming system of the comparative example is configured by connecting the image forming device and the smart speaker via a network in order to improve the convenience of the user. Here, in the image forming system of the comparative example, the relationship between the language of the response voice of the smart speaker 20 and the language of the completed voice of the MFP 140 was not considered. Therefore, as shown in FIG. 1C, the MFP140X outputs the completion voice in Japanese, which is the default setting language. In the example of FIG. 1 (C), the completed voice is the voice "copy completed" in Japanese. In the image forming system of the comparative example, as shown by the * mark in FIG. 1 (C), since the language of the response voice and the language of the completed voice are different, there arises a problem that the user feels a sense of discomfort regarding the language.

一方、本実施形態のＭＦＰ１４０を含む画像形成システムでは、後述する情報処理装置が、スマートスピーカー２０に入力された入力音声を解析することにより入力音声の言語を取得する。その後、情報処理装置が、該取得した言語をＭＦＰ１４０に対して送信する。ＭＦＰ１４０は、設定されている言語（日本語）から、該取得した入力音声の言語（英語）に、設定言語を切換える。したがって、図１（Ｄ）に示すように、ＭＦＰ１４０は、切換えられた言語である英語で、完了音声を出力する。図１（Ｄ）の例では、完了音声は、英語での「Copy finished」という音声である。本実施形態の画像形成システムでは、図１（Ｄ）の※印に示すように、応答音声の言語と、完了音声の言語とが同一となることから、言語に関する違和感をユーザーに感じさせることを防止できる。 On the other hand, in the image forming system including the MFP 140 of the present embodiment, the information processing device described later acquires the language of the input voice by analyzing the input voice input to the smart speaker 20. After that, the information processing device transmits the acquired language to the MFP 140. The MFP 140 switches the set language from the set language (Japanese) to the language of the acquired input voice (English). Therefore, as shown in FIG. 1 (D), the MFP 140 outputs the completion voice in English, which is the switched language. In the example of FIG. 1 (D), the completed voice is the voice "Copy finished" in English. In the image forming system of the present embodiment, as shown by the * mark in FIG. 1 (D), the language of the response voice and the language of the completed voice are the same, so that the user feels a sense of discomfort regarding the language. Can be prevented.

なお、スマートスピーカー２０が、完了音声を出力する構成が考えられる。しかしながら、スマートスピーカー２０とＭＦＰ１４０とのセッションを長時間継続させることは以下の理由により好ましくない。スマートスピーカー２０とＭＦＰ１４０とのセッションを長時間継続させた場合において、スマートスピーカー２０にノイズ音等が入力される場合がある。スマートスピーカー２０にノイズ音等が入力されると、スマートスピーカー２０または後述の情報処理装置は、該ノイズ音を音声認識できない。このため、スマートスピーカー２０は、頻繁に聞き直す応答音声（例えば「もう一度言ってください」などという音声）を出力することになったり、ＭＦＰ１４０により誤った音声認識がなされてＭＦＰ１４０が誤った処理を実行してしまう場合がある。したがって、スマートスピーカー２０とＭＦＰ１４０とのセッションを長時間継続させることは好ましくない。 It is conceivable that the smart speaker 20 outputs a completed voice. However, it is not preferable to continue the session between the smart speaker 20 and the MFP 140 for a long time for the following reasons. When the session between the smart speaker 20 and the MFP 140 is continued for a long time, noise sound or the like may be input to the smart speaker 20. When a noise sound or the like is input to the smart speaker 20, the smart speaker 20 or the information processing device described later cannot recognize the noise sound by voice. For this reason, the smart speaker 20 frequently outputs a response voice (for example, a voice such as "Please say again") that is frequently re-listened, or the MFP 140 makes an erroneous voice recognition and the MFP 140 executes an erroneous process. It may end up. Therefore, it is not preferable to continue the session between the smart speaker 20 and the MFP 140 for a long time.

なお、スマートスピーカーをＭＦＰ１４０に接続させずに、ＭＦＰ１４０に音声入出力機能を備えさせることが考えられる。しかしながら、ＭＦＰ１４０に音声入出力機能を備えさせると、ＭＦＰ１４０の製造コストが増大してしまう。そこで、本実施形態では、ＭＦＰ１４０に音声出力機能を備えさせ、音声入力機能を備えさせないものとする。 It is conceivable to equip the MFP 140 with an audio input / output function without connecting the smart speaker to the MFP 140. However, if the MFP 140 is provided with a voice input / output function, the manufacturing cost of the MFP 140 increases. Therefore, in the present embodiment, the MFP 140 is provided with an audio output function and is not provided with an audio input function.

［画像形成システムの機能構成例］
図２は、本実施形態の画像形成システム１０００の機能構成例を示す図である。図２に示すように画像形成システム１０００は、スマートスピーカー２０と、第１ネットワーク４０と、第１情報処理装置６０と、第２ネットワーク８０と、第２情報処理装置１００と、第３ネットワーク１２０と、ＭＦＰ１４０と、第１提供装置１５０と、第２提供装置１６０とを含む。 [Example of functional configuration of image formation system]
FIG. 2 is a diagram showing a functional configuration example of the image forming system 1000 of the present embodiment. As shown in FIG. 2, the image forming system 1000 includes a smart speaker 20, a first network 40, a first information processing device 60, a second network 80, a second information processing device 100, and a third network 120. , MFP 140, first providing device 150, and second providing device 160.

スマートスピーカー２０は、本開示の「音声入出力装置」に対応する。ＭＦＰ１４０は、本開示の「画像形成装置」に対応する。第１情報処理装置６０は、本開示の「情報処理装置」に対応する。また、第１情報処理装置６０、第２情報処理装置１００、第１提供装置１５０、および第２提供装置１６０は、典型的には、クラウド装置として構成される。 The smart speaker 20 corresponds to the "voice input / output device" of the present disclosure. The MFP 140 corresponds to the "image forming apparatus" of the present disclosure. The first information processing device 60 corresponds to the "information processing device" of the present disclosure. Further, the first information processing device 60, the second information processing device 100, the first providing device 150, and the second providing device 160 are typically configured as cloud devices.

また、典型的には、スマートスピーカー２０と、第１情報処理装置６０とは同一の製造メーカーである。また、第２情報処理装置１００と、ＭＦＰ１４０とは同一の製造メーカーである。 Further, typically, the smart speaker 20 and the first information processing device 60 are the same manufacturer. Further, the second information processing apparatus 100 and the MFP 140 are the same manufacturers.

「入力音声」は、ユーザーから発せられる音声でありかつスマートスピーカー２０に入力される音声である。入力音声は、本開示の「第１音声」に対応する。「入力音声」は、日本語、および日本語以外の言語（例えば、英語、中国語等）のいずれかでユーザーから発せられる言語である。ＭＦＰ１４０は、該入力音声に含まれるジョブ音声に基づいて、ジョブを実行することができる。 The "input voice" is a voice emitted from the user and is a voice input to the smart speaker 20. The input voice corresponds to the "first voice" of the present disclosure. The "input voice" is a language emitted by the user in either Japanese or a language other than Japanese (for example, English, Chinese, etc.). The MFP 140 can execute the job based on the job voice included in the input voice.

「応答音声」は、スマートスピーカー２０から発せられる音声である。「応答音声」は、本開示の「第２音声」に対応する。「応答音声」は、スマートスピーカー２０に入力された入力音声に応答する音声である。 The "response voice" is a voice emitted from the smart speaker 20. The "response voice" corresponds to the "second voice" of the present disclosure. The "response voice" is a voice that responds to the input voice input to the smart speaker 20.

「完了音声」は、ＭＦＰ１４０から発せられる音声である。「完了音声」は、本開示の「第３音声」に対応する。「完了音声」は、ユーザーによる入力音声に基づくジョブがＭＦＰ１４０により完了したことを示す音声である。 The "completion voice" is a voice emitted from the MFP 140. The "completion voice" corresponds to the "third voice" of the present disclosure. The “completion voice” is a voice indicating that the job based on the voice input by the user has been completed by the MFP 140.

ユーザーＵは、スマートスピーカー２０に対して、入力音声を発することにより、ＭＦＰ１４０にジョブを入力することができる。入力音声の音声データは、第１ネットワーク４０を経由して、第１情報処理装置６０に入力される。第１情報処理装置６０は、入力音声の音声データを解析することにより、音声の言語および音声に含まれるジョブの内容を検出する。第１情報処理装置６０は、検出された音声の言語での応答音声を示す音声データをスマートスピーカー２０に対して送信する。スマートスピーカー２０は、応答音声を示す音声データに基づいて、応答音声を出力する。 The user U can input a job to the MFP 140 by emitting an input voice to the smart speaker 20. The voice data of the input voice is input to the first information processing device 60 via the first network 40. The first information processing device 60 detects the language of the voice and the content of the job included in the voice by analyzing the voice data of the input voice. The first information processing device 60 transmits voice data indicating the response voice in the language of the detected voice to the smart speaker 20. The smart speaker 20 outputs the response voice based on the voice data indicating the response voice.

第１情報処理装置６０は、音声の言語および音声に含まれるジョブの内容を、第２ネットワーク８０を経由して第２情報処理装置１００に対して送信する。第２情報処理装置１００は、該音声の言語および音声に含まれるジョブの内容を、ＭＦＰ１４０に対するコマンドの形式に変換して、第３ネットワーク１２０を経由してＭＦＰ１４０に対して送信する。 The first information processing device 60 transmits the language of the voice and the contents of the job included in the voice to the second information processing device 100 via the second network 80. The second information processing apparatus 100 converts the spoken language and the content of the job included in the voice into a command format for the MFP 140, and transmits the command to the MFP 140 via the third network 120.

ＭＦＰ１４０は、音声の言語および該ジョブの内容を取得する。ＭＦＰ１４０は、該取得されたジョブの内容に基づくジョブを実行する。また、ＭＦＰ１４０は、該実行したジョブが完了すると、該ジョブが完了したことを示す完了音声を、該取得した言語で出力する。 The MFP 140 acquires the spoken language and the content of the job. The MFP 140 executes a job based on the contents of the acquired job. Further, when the executed job is completed, the MFP 140 outputs a completion voice indicating that the job is completed in the acquired language.

本実施形態の画像形成システム１０００は、入力音声、応答音声、および完了音声を全て同一の言語とすることができる。 In the image forming system 1000 of the present embodiment, the input voice, the response voice, and the completed voice can all be in the same language.

例えば、入力音声が、日本語による「カラーコピー２枚して下さい」という音声である場合には、日本語による「分かりました」という応答音声がスマートスピーカー２０から出力され、日本語による「コピーが終了しました」という完了音声がＭＦＰ１４０から出力される。 For example, if the input voice is the voice "Please make two color copies" in Japanese, the response voice "I understand" in Japanese is output from the smart speaker 20 and the "copy" in Japanese is output. Is completed. ”Is output from the MFP 140.

また、入力音声が、英語による「Please make two color copies」という音声である場合には、英語による「OK」という応答音声がスマートスピーカー２０から出力され、英語による「Copy finished」という完了音声がＭＦＰ１４０から出力される。 When the input voice is the voice "Please make two color copies" in English, the response voice "OK" in English is output from the smart speaker 20, and the completion voice "Copy finished" in English is the MFP140. Is output from.

このように、本実施形態の画像形成システム１０００は、入力音声、応答音声、および完了音声が全て同一の言語となることにより、ユーザーに違和感を感じさせることを防止することができる。 As described above, the image forming system 1000 of the present embodiment can prevent the user from feeling uncomfortable because the input voice, the response voice, and the completed voice are all in the same language.

［ＭＦＰのハードウェア構成例］
図３は、ＭＦＰ１４０のハードウェア構成例を示す図である。図３を参照して、ＭＦＰ１４０のハードウェア構成例を説明する。図３に示すように、ＭＦＰ１４０は、制御部３１と、固定記憶装置３２と、短距離無線ＩＦ（Inter Face）３３と、操作パネル３４と、給紙ユニット１４と、処理ユニット１１と、スピーカー３５と、ネットワークＩＦ３６とを有する。制御部３１には、各構成部がバス３８を介して接続されている。 [Example of MFP hardware configuration]
FIG. 3 is a diagram showing a hardware configuration example of the MFP 140. A hardware configuration example of the MFP 140 will be described with reference to FIG. As shown in FIG. 3, the MFP 140 includes a control unit 31, a fixed storage device 32, a short-range wireless IF (Inter Face) 33, an operation panel 34, a paper feed unit 14, a processing unit 11, and a speaker 35. And the network IF36. Each component is connected to the control unit 31 via a bus 38.

制御部３１は、ＣＰＵ（Central Processing Unit）３１１と、制御プログラムが格納されたＲＯＭ（Read Only Memory）３１２と、作業用のＳ−ＲＡＭ（Static Random Access Memory）３１３と、画像形成に関わる各種の設定を記憶するバッテリバックアップされたＮＶ−ＲＡＭ（Non-Volatile RAM：不揮発性メモリ）３１４と、時計ＩＣ（Integrated Circuit）３１５とを有する。 The control unit 31 includes a CPU (Central Processing Unit) 311, a ROM (Read Only Memory) 312 in which a control program is stored, a working S-RAM (Static Random Access Memory) 313, and various types related to image formation. It has a battery-backed NV-RAM (Non-Volatile RAM) 314 for storing settings and a clock IC (Integrated Circuit) 315.

ＣＰＵ３１１は、ＲＯＭ３１２等に保存されている動作プログラムを実行することにより、ＭＰ１４０全体を総括的に制御する。特に本実施の形態のＭＦＰ１４０は、スマートスピーカー２０に対する入力音声に基づいたジョブを実行することができる。 The CPU 311 comprehensively controls the entire MP140 by executing an operation program stored in the ROM 312 or the like. In particular, the MFP 140 of the present embodiment can execute a job based on the input voice to the smart speaker 20.

ＲＯＭ３１２には、ＣＰＵ３１１が実行するプログラムやその他のデータを格納する。Ｓ−ＲＡＭ３１３は、ＣＰＵ３１１がプログラムを実行する際の作業領域となるものであり、プログラムやプログラムを実行する際のデータ等を一時的に保存する。ＮＶ−ＲＡＭ３１４は、バッテリーでバックアップされた不揮発メモリであり、画像形成に係わる各種の設定等を記憶するものである。時計ＩＣ３１５は、時刻を計時すると共に、内部タイマーとして機能し処理時間の計測等を行う。固定記憶装置３２は、ハードディスク等からなり、プログラムや各種データ等を保存する。短距離無線Ｉ／Ｆ３３は、他の装置と短距離無線通信をする。 The ROM 312 stores programs and other data executed by the CPU 311. The S-RAM 313 is a work area when the CPU 311 executes a program, and temporarily stores the program, data when the program is executed, and the like. The NV-RAM 314 is a non-volatile memory backed up by a battery, and stores various settings and the like related to image formation. The clock IC315 measures the time and functions as an internal timer to measure the processing time and the like. The fixed storage device 32 is composed of a hard disk or the like, and stores programs, various data, and the like. The short-range wireless I / F 33 performs short-range wireless communication with other devices.

操作パネル３４は、ディスプレイ１０５１と、タッチパネル１０５２とを含む。操作パネル３４は、表示装置としてのディスプレイ１０５１と、入力装置としてのタッチパネル１０５２とにより構成される。具体的には、操作パネル３４は、ディスプレイ１０５１（たとえば液晶ディスプレイ）上にタッチパネル１０５２を位置決めした上で固定することにより実現される。ディスプレイ１０５１は、文字を含むメニュー画面、ジョブ設定画像、およびオプション設定画面などを表示可能である。なお、タッチスクリーンは、タッチパネルディスプレイ、タッチパネル付きディスプレイ、あるいはタッチパネルモニタとも称される。なお、操作パネル３４においては、タッチ位置の検出方法として、たとえば抵抗膜方式または静電容量方式を用いることができる。 The operation panel 34 includes a display 1051 and a touch panel 1052. The operation panel 34 is composed of a display 1051 as a display device and a touch panel 1052 as an input device. Specifically, the operation panel 34 is realized by positioning and fixing the touch panel 1052 on the display 1051 (for example, a liquid crystal display). The display 1051 can display a menu screen including characters, a job setting image, an option setting screen, and the like. The touch screen is also referred to as a touch panel display, a display with a touch panel, or a touch panel monitor. In the operation panel 34, for example, a resistance film method or a capacitance method can be used as a method for detecting the touch position.

給紙ユニット１４には、画像形成対象の用紙が収容される。処理ユニット１１は、種々の処理を実行するユニットを含む。処理ユニットは、例えば、画像形成ユニット、およびスキャンユニット等を含む。画像形成ユニットは、ユーザー等により指定された複写画像を用紙上に形成する。また、スキャンユニットは、ユーザー等により指定された画像をスキャンする。 The paper feeding unit 14 accommodates the paper to be image-formed. The processing unit 11 includes a unit that executes various processes. The processing unit includes, for example, an image forming unit, a scanning unit, and the like. The image forming unit forms a copy image designated by a user or the like on paper. In addition, the scan unit scans an image specified by the user or the like.

ネットワークＩＦ３６は、第３ネットワーク１２０を介して、種々の情報を送受信することができる。スピーカー３５は、完了音声などの種々の音声を出力する。 The network IF36 can transmit and receive various information via the third network 120. The speaker 35 outputs various voices such as a completed voice.

［第１情報処理装置の機能構成例］
次に、第１情報処理装置６０の機能構成例を説明する。図４は、第１情報処理装置６０の機能ブロック図である。図４に示すように、第１情報処理装置６０は、検出部６０２と、変換部６０４と、記憶部６０６と、送受信部６０８との機能を有する。 [Example of functional configuration of the first information processing device]
Next, a functional configuration example of the first information processing device 60 will be described. FIG. 4 is a functional block diagram of the first information processing device 60. As shown in FIG. 4, the first information processing device 60 has functions of a detection unit 602, a conversion unit 604, a storage unit 606, and a transmission / reception unit 608.

検出部６０２は、スマートスピーカー２０に入力された入力音声のデータ（入力音声データ）を、第１ネットワーク４０を経由して取得する。検出部６０２は、入力音声データを解析することにより、入力音声の言語と、該入力音声に含まれるジョブの内容と、オプションの内容と、ジョブ実行指示とを検出する。換言すれば、例えば、検出部６０２は、入力音声に対して音声認識処理を実行することにより、入力音声の言語と、該入力音声に含まれるジョブの内容と、オプションの内容と、ジョブ実行指示とを検出する。 The detection unit 602 acquires the input voice data (input voice data) input to the smart speaker 20 via the first network 40. By analyzing the input voice data, the detection unit 602 detects the language of the input voice, the content of the job included in the input voice, the content of the option, and the job execution instruction. In other words, for example, the detection unit 602 executes a voice recognition process on the input voice, so that the language of the input voice, the content of the job included in the input voice, the content of the option, and the job execution instruction And detect.

以下では、言語を示す情報を「言語情報」という。ジョブの内容を示す情報を「ジョブ情報」という。ジョブの内容は、例えば、画像形成ジョブ、およびスキャンジョブ等を含む。オプションの内容を示す情報を「オプション情報」という。オプションは、例えば、カラー印刷を実行すること、ステープル機能を実行すること、両面印刷を実行すること等を含む。ジョブ実行指示は、「検出されたオプションの内容に基づいて、検出されたジョブを実行させる指示」をいう。 In the following, information indicating a language is referred to as "language information". Information indicating the contents of a job is called "job information". The contents of the job include, for example, an image formation job, a scan job, and the like. Information indicating the contents of options is called "option information". Options include, for example, performing color printing, performing staple functions, performing double-sided printing, and the like. The job execution instruction means "an instruction to execute a detected job based on the contents of the detected option".

例えば、入力音声が、日本語による「カラーコピー２枚して下さい」という音声である場合には、検出部６０２は、言語情報として、「日本語」を検出し、ジョブ情報として、「コピーを２枚実行すること」を検出し、オプション情報として、「カラーコピーを実行すること」を検出し、「して下さい」という用語に基づいて「２枚のカラーコピーを実行する」という指示を検出する。 For example, when the input voice is a voice saying "Please make two color copies" in Japanese, the detection unit 602 detects "Japanese" as the language information and "copy" as the job information. Detects "to execute two sheets", detects "to execute color copy" as optional information, and detects an instruction to "execute two color copies" based on the term "please". To do.

また、入力音声が、英語による「Please make two color copies」という音声である場合には、検出部６０２は、言語情報として、「英語」を検出し、ジョブ情報として、「コピーを２枚実行すること」を検出し、オプション情報として、「カラーコピーを実行すること」を検出し、「Please」という用語に基づいて「２枚のカラーコピーを実行する」という指示を検出する。 When the input voice is the voice "Please make two color copies" in English, the detection unit 602 detects "English" as the language information and "executes two copies" as the job information. Detects "that", detects "performing a color copy" as optional information, and detects an instruction to "perform two color copies" based on the term "Please".

言語情報と、ジョブ情報とは、送受信部６０８に出力される。また、言語情報は、変換部６０４に出力される。変換部６０４は、言語情報に基づいて、応答音声データに変換する。記憶部６０６には、例えば、言語情報と、応答音声データとが対応付けられたテーブルが予め記憶されている。このテーブルでは、例えば、日本語である言語情報に対しては、日本語の応答音声の応答音声データが対応づけられている。また、このテーブルでは、英語である言語情報に対しては、英語の応答音声の応答音声データが対応づけられている。 The language information and the job information are output to the transmission / reception unit 608. Further, the language information is output to the conversion unit 604. The conversion unit 604 converts the response voice data based on the linguistic information. In the storage unit 606, for example, a table in which the language information and the response voice data are associated with each other is stored in advance. In this table, for example, the response voice data of the Japanese response voice is associated with the language information that is Japanese. Further, in this table, the response voice data of the English response voice is associated with the English language information.

変換部６０４は、テーブルを参照して、検出部６０２から取得した言語情報に対応する応答音声データを取得する。変換部６０４は、応答音声データを送受信部６０８に対して送信する。送受信部６０８は、第１ネットワーク４０経由で、応答音声データをスマートスピーカー２０に対して送信する。スマートスピーカー２０は、該応答音声データに基づいて応答音声を出力する。 The conversion unit 604 refers to the table and acquires the response voice data corresponding to the language information acquired from the detection unit 602. The conversion unit 604 transmits the response voice data to the transmission / reception unit 608. The transmission / reception unit 608 transmits the response voice data to the smart speaker 20 via the first network 40. The smart speaker 20 outputs a response voice based on the response voice data.

つまり、本実施形態では、入力音声（第１音声）の言語と、応答音声（第２音声）の言語とが同一となる。また、検出部６０２により生成される言語情報は、「入力音声（第１音声）の言語を特定可能な言語情報」でもあり、「応答音声（第２音声）の言語を特定可能な言語情報」でもある。 That is, in the present embodiment, the language of the input voice (first voice) and the language of the response voice (second voice) are the same. Further, the language information generated by the detection unit 602 is also "language information capable of specifying the language of the input voice (first voice)" and "language information capable of specifying the language of the response voice (second voice)". But also.

また、送受信部６０８は、言語情報とジョブ情報とをコマンドとして第２ネットワーク８０を経由して第２情報処理装置１００に対して送信する。第２情報処理装置１００は、このコマンドを第３ネットワーク１２０を経由してＭＦＰ１４０に対して送信する。 Further, the transmission / reception unit 608 transmits the language information and the job information as commands to the second information processing apparatus 100 via the second network 80. The second information processing device 100 transmits this command to the MFP 140 via the third network 120.

［ＭＦＰの機能構成例］
次に、ＭＦＰ１４０の機能構成例を説明する。図５は、ＭＦＰ１４０の機能ブロックの一例を示す図である。図５に示すように、ＭＦＰ１４０の制御部３１は、取得部１４０２と、実行部１４０４と、切換部１４０６と、設定部１４０８と、スピーカー制御部１４１０と、表示制御部１４１２と、記憶部１４１４との機能を有する。また、破線で示している「禁止部１４１６」については後述する。 [Example of MFP function configuration]
Next, a functional configuration example of the MFP 140 will be described. FIG. 5 is a diagram showing an example of a functional block of the MFP 140. As shown in FIG. 5, the control unit 31 of the MFP 140 includes an acquisition unit 1402, an execution unit 1404, a switching unit 1406, a setting unit 1408, a speaker control unit 1410, a display control unit 1412, and a storage unit 1414. Has the function of. The "prohibited portion 1416" shown by the broken line will be described later.

取得部１４０２は、言語情報とジョブ情報とが含まれるコマンドを取得する。つまり、取得部１４０２は、第１情報処理装置６０の送受信部６０８により送信された入力音声の言語情報（言語）を取得する言語取得部として機能する。言語情報は、切換部１４０６に送信される。 The acquisition unit 1402 acquires a command including language information and job information. That is, the acquisition unit 1402 functions as a language acquisition unit that acquires the language information (language) of the input voice transmitted by the transmission / reception unit 608 of the first information processing device 60. The language information is transmitted to the switching unit 1406.

また、取得部１４０２は、第１情報処理装置６０の送受信部６０８により送信されたジョブ情報（ジョブ）を取得するジョブ取得部として機能する。ジョブ情報は、実行部１４０４に送信される。 In addition, the acquisition unit 1402 functions as a job acquisition unit that acquires job information (job) transmitted by the transmission / reception unit 608 of the first information processing device 60. The job information is transmitted to the execution unit 1404.

実行部１４０４は、該ジョブ情報に基づいたジョブを処理ユニット１１に実行させる。例えば、ジョブ情報が、画像形成処理を示すジョブである場合には、実行部１４０４は、処理ユニット１１のうちの画像形成ユニットに対して、画像形成処理を実行させる。また、ジョブ情報が、スキャン処理を示すジョブである場合には、実行部１４０４は、処理ユニット１１のうちのスキャンユニットに対して、スキャン処理を実行させる。 The execution unit 1404 causes the processing unit 11 to execute a job based on the job information. For example, when the job information is a job indicating an image forming process, the execution unit 1404 causes the image forming unit of the processing units 11 to execute the image forming process. When the job information is a job indicating a scan process, the execution unit 1404 causes the scan unit among the processing units 11 to execute the scan process.

記憶部１４１４には、複数の言語の音声データと、複数の言語の画像データとが格納されている。図５の例では、複数の言語の音声データは、日本語音声データと、英語音声データと、中国語音声データである。また、図５の例では、複数の言語の画像データは、日本語音声データと、英語音声データと、中国語音声データである。 Audio data in a plurality of languages and image data in a plurality of languages are stored in the storage unit 1414. In the example of FIG. 5, the voice data of the plurality of languages are Japanese voice data, English voice data, and Chinese voice data. Further, in the example of FIG. 5, the image data of the plurality of languages are Japanese voice data, English voice data, and Chinese voice data.

設定部１４０８は、複数の言語のうちいずれかの言語を設定する。以下では、設定部１４０８により設定されている言語を「設定言語」ともいう。設定部１４０８は、例えば、ユーザーの設定操作により指定された言語を設定する。ユーザーによる設定操作は、操作パネル３４に対して行われる。なお、図５の例では、設定している音声データおよび画像データを太線で囲っている。図５の例では、設定部１４０８は、日本語を設定している。したがって、図５の例では、日本語音声データと、日本語画像データとが太線で囲まれている。なお、典型的には、設定部１４０８は、ＭＦＰ１４０が設置されている国の母国語（本実施形態では、日本語）をデフォルトの言語として設定している。 The setting unit 1408 sets one of a plurality of languages. Hereinafter, the language set by the setting unit 1408 is also referred to as a “setting language”. The setting unit 1408 sets, for example, the language specified by the user's setting operation. The setting operation by the user is performed on the operation panel 34. In the example of FIG. 5, the set audio data and image data are surrounded by a thick line. In the example of FIG. 5, the setting unit 1408 sets Japanese. Therefore, in the example of FIG. 5, the Japanese audio data and the Japanese image data are surrounded by a thick line. Typically, the setting unit 1408 sets the native language of the country in which the MFP 140 is installed (Japanese in the present embodiment) as the default language.

設定部１４０８は、設定している言語での音声データと画像データとを抽出する。設定部１４０８は、設定している言語での音声データをスピーカー制御部１４１０に対して送信する。スピーカー制御部１４１０は、設定している言語での音声データに基づく音声をスピーカー３５から出力させる。これにより、制御部３１は、設定部１４０８が設定している言語での音声をスピーカー３５から出力させることができる。 The setting unit 1408 extracts audio data and image data in the set language. The setting unit 1408 transmits the voice data in the set language to the speaker control unit 1410. The speaker control unit 1410 outputs the voice based on the voice data in the set language from the speaker 35. As a result, the control unit 31 can output the sound in the language set by the setting unit 1408 from the speaker 35.

また、設定部１４０８は、設定している言語での画像データを表示制御部１４１２に対して送信する。表示制御部１４１２は、設定している言語での画像データに基づく画面をディスプレイ１０５１に表示させる。この画面には、１以上の文字が含まれる。これにより、制御部３１は、設定部１４０８が設定している言語での画面（文字）をディスプレイ１０５１に表示させることができる。 Further, the setting unit 1408 transmits the image data in the set language to the display control unit 1412. The display control unit 1412 causes the display 1051 to display a screen based on the image data in the set language. This screen contains one or more characters. As a result, the control unit 31 can display the screen (characters) in the language set by the setting unit 1408 on the display 1051.

また、ディスプレイ１０５１に表示される画面は、ジョブ設定画像と、オプション変更画像とを含む。図６は、ディスプレイ１０５１の表示領域１０５１Ａに表示される表示画面の一例である。図６の例は、設定言語が日本語である場合の表示画面の例である。 The screen displayed on the display 1051 includes a job setting image and an option change image. FIG. 6 is an example of a display screen displayed in the display area 1051A of the display 1051. The example of FIG. 6 is an example of a display screen when the setting language is Japanese.

図６の例では、表示画面には、ジョブ設定画像３２０と、オプション変更画像３２２とが含まれる。ジョブ設定画像３２０は、例えば、ユーザーから発せられた入力音声に基づくジョブが設定されたことを示す画像である。例えば、入力音声が、「カラーコピー２枚して下さい」という音声である場合には、ジョブ設定画像として、２枚のコピーが実行される旨の画像が表示される。図６の例でのジョブ設定画像３２０は、コピー部数として「２部」が設定されている画像（つまり、２枚のコピーが実行される旨の画像）である。 In the example of FIG. 6, the display screen includes the job setting image 320 and the option change image 322. The job setting image 320 is, for example, an image showing that a job based on an input voice issued by a user has been set. For example, when the input voice is a voice saying "Please make two color copies", an image indicating that two copies are executed is displayed as a job setting image. The job setting image 320 in the example of FIG. 6 is an image in which "2 copies" is set as the number of copies (that is, an image indicating that two copies are executed).

図６の例では、変更されたオプションを示す情報が太線で囲まれている。つまり、図６の例では、変更されたオプションとして、「カラーコピー」が設定されている。 In the example of FIG. 6, the information indicating the changed option is surrounded by a thick line. That is, in the example of FIG. 6, "color copy" is set as the changed option.

ＭＦＰ１４０が、ディスプレイ１０５１にジョブ設定画像３２０と、オプション変更画像３２２とを表示することにより、ユーザーが音声でジョブを入力した場合であっても、設定されたジョブおよび変更されたオプションを該ユーザーは確認することができる。 By displaying the job setting image 320 and the option change image 322 on the display 1051, the MFP 140 allows the user to display the set job and the changed option even when the user inputs the job by voice. You can check.

また、本実施形態では、切換部１４０６は、設定部１４０８が設定している言語（例えば、デフォルトの言語）から、取得部１４０２が取得した言語情報が示す言語に切換える切換処理を実行する。例えば、設定部１４０８が、日本語を設定している場合において、取得部１４０２が英語の言語情報を取得した場合には、切換部１４０６は、設定言語を日本語から英語に切換える。 Further, in the present embodiment, the switching unit 1406 executes a switching process of switching from the language set by the setting unit 1408 (for example, the default language) to the language indicated by the language information acquired by the acquisition unit 1402. For example, when the setting unit 1408 sets Japanese and the acquisition unit 1402 acquires English language information, the switching unit 1406 switches the setting language from Japanese to English.

設定部１４０８が日本語を設定している場合には、スピーカー制御部１４１０は、スピーカー３５から日本語の音声を出力させるとともに、表示制御部１４１２は、ディスプレイ１０５１に日本語の画像（文字）を表示させる。そして、切換部１４０６が、設定部１４０８が設定している言語を、日本語から英語に切換えた場合には、スピーカー制御部１４１０は、スピーカー３５から英語の音声（切換部１４０６により切換えられた言語の音声）を出力させるとともに、表示制御部１４１２は、ディスプレイ１０５１に英語の文字（切換部１４０６により切換えられた言語の文字）を表示させる。切換部１４０６が設定言語を英語に切換えた場合には、図６のジョブ設定画像３２０と、オプション変更画像３２２などを英語で表示する。 When the setting unit 1408 sets Japanese, the speaker control unit 1410 outputs Japanese sound from the speaker 35, and the display control unit 1412 displays a Japanese image (character) on the display 1051. Display it. Then, when the switching unit 1406 switches the language set by the setting unit 1408 from Japanese to English, the speaker control unit 1410 sends the English voice from the speaker 35 (the language switched by the switching unit 1406). The display control unit 1412 displays English characters (characters in the language switched by the switching unit 1406) on the display 1051. When the switching unit 1406 switches the setting language to English, the job setting image 320 and the option change image 322 of FIG. 6 are displayed in English.

また、実行部１４０４の制御に基づく処理ユニット１１のジョブが完了した場合には、スピーカー制御部１４１０は、完了音声を出力する。スピーカー制御部１４１０は、完了音声を、設定部１４０８により設定されている言語で出力する。例えば、設定部１４０８が、日本語を設定している場合には、完了音声として、スピーカー３５は、「コピーが終了しました」という音声を出力する。また、設定部１４０８が、英語を設定している場合には（設定言語が英語に切換得られている場合には）、完了音声として、スピーカー３５は、「Copy finished」という音声を出力する。 Further, when the job of the processing unit 11 based on the control of the execution unit 1404 is completed, the speaker control unit 1410 outputs the completion voice. The speaker control unit 1410 outputs the completed voice in the language set by the setting unit 1408. For example, when the setting unit 1408 sets Japanese, the speaker 35 outputs a voice saying "copy is completed" as a completion voice. Further, when the setting unit 1408 sets English (when the setting language is switched to English), the speaker 35 outputs the voice "Copy finished" as the completed voice.

また、実行部１４０４の制御に基づく処理ユニット１１のジョブの処理において、エラーが発生したと判断された場合には、エラーが発生したことを示すエラー音声を出力する。ここで、エラー音声は、エラーが発生したことを示す音声と、該エラーの内容を示す音声とを含む。 Further, when it is determined that an error has occurred in the job processing of the processing unit 11 based on the control of the execution unit 1404, an error voice indicating that the error has occurred is output. Here, the error voice includes a voice indicating that an error has occurred and a voice indicating the content of the error.

例えば、エラーが「トナー切れというエラー」である場合には、エラーが発生したことを示すエラー発生音声は、「エラーが発生しました」という音声であり、エラーの内容を示すエラー内容音声は、「トナー切れです」という音声である。本実施形態では、スピーカー３５は、エラー音声として、エラー発生音声を出力し、その後、続けて、エラー内容音声を出力する。例えば、スピーカー３５は、「エラーが発生しました。トナー切れです」というエラー音声を出力する。 For example, when the error is "error of out of toner", the error occurrence voice indicating that the error has occurred is the voice "an error has occurred", and the error content voice indicating the content of the error is The voice says "Toner is out". In the present embodiment, the speaker 35 outputs an error occurrence sound as an error sound, and then continuously outputs an error content sound. For example, the speaker 35 outputs an error voice saying "An error has occurred. Toner is out."

また、本実施形態のＭＦＰ１４０は、切換部１４０６により言語が切換えられている場合であっても、切換部１４０６により言語が切換えられていない場合であっても、スピーカー３５は、デフォルトの言語（本実施形態では、日本語）で、エラー発生音声、およびエラー内容音声を出力する。デフォルトの言語は、切換部１４０６による切換前の言語である。 Further, in the MFP 140 of the present embodiment, the speaker 35 is the default language (this) regardless of whether the language is switched by the switching unit 1406 or the language is not switched by the switching unit 1406. In the embodiment, the error occurrence voice and the error content voice are output in Japanese). The default language is the language before switching by the switching unit 1406.

また、取得部１４０２が取得した言語情報が示す言語の音声データが記憶部１４１４に記憶されていない場合がある。この場合とは、例えば、取得部１４０２が、フランス語を示す言語情報を取得した場合である。取得部１４０２がフランス語を示す言語情報を取得した場合には、制御部３１は、フランス語に対応する音声データを第１提供装置１５０（図２参照）に要求する。第１提供装置１５０は、複数の言語それぞれの音声データを記憶する音声データベースを保持する。第１提供装置１５０は、ＭＦＰ１４０に対して該音声データを提供する。該音声データは、完了音声の音声データと、エラー音声の音声データとを含む。 Further, the audio data of the language indicated by the language information acquired by the acquisition unit 1402 may not be stored in the storage unit 1414. This case is, for example, a case where the acquisition unit 1402 acquires linguistic information indicating French. When the acquisition unit 1402 acquires the linguistic information indicating French, the control unit 31 requests the first providing device 150 (see FIG. 2) for voice data corresponding to French. The first providing device 150 holds a voice database that stores voice data of each of a plurality of languages. The first providing device 150 provides the audio data to the MFP 140. The voice data includes the voice data of the completed voice and the voice data of the error voice.

図７は、第１提供装置１５０が保持する音声データベースの一例を示す図である。図７の音声データベースでは、複数の言語である日本語、英語、中国語、フランス語、およびドイツ語それぞれに対して音声データが記憶されている。図７の例では、日本語に対しては日本語用画像データが記憶されており、英語に対しては英語用画像データが記憶されており、中国語に対しては中国語用画像データが記憶されており、フランス語に対してはフランス語用画像データが記憶されており、ドイツ語に対してはドイツ語用画像データが記憶されている。また、他の言語についても画像データが記憶されている。 FIG. 7 is a diagram showing an example of a voice database held by the first providing device 150. In the voice database of FIG. 7, voice data is stored for each of a plurality of languages, Japanese, English, Chinese, French, and German. In the example of FIG. 7, Japanese image data is stored for Japanese, English image data is stored for English, and Chinese image data is stored for Chinese. It is memorized, and the image data for French is stored for French, and the image data for German is stored for German. Image data is also stored in other languages.

制御部３１は、フランス語に対応するフランス語用音声データを要求するための要求信号を第１提供装置１５０に送信する。要求信号には、言語情報（ここでは、フランス語の言語情報）が含まれている。 The control unit 31 transmits a request signal for requesting French voice data corresponding to French to the first providing device 150. The request signal contains linguistic information (here, French linguistic information).

第１提供装置１５０は、要求信号を受信すると、該要求信号に含まれる言語情報を抽出し、該抽出された言語情報に対応する音声データを音声データベースから取得する。第１提供装置１５０は、取得した音声データを、要求信号の送信元のＭＦＰ１４０に対して送信する。 Upon receiving the request signal, the first providing device 150 extracts the language information included in the request signal and acquires the voice data corresponding to the extracted language information from the voice database. The first providing device 150 transmits the acquired voice data to the MFP 140, which is the source of the request signal.

取得部１４０２は、第１提供装置１５０に対して要求していた音声データを、第１提供装置１５０から取得する。取得部１４０２は、該取得した音声データを記憶部１４１４に一旦、記憶させる。さらに、設定部１４０８は、該記憶した音声データ（フランス語の音声データ）をスピーカー制御部１４１０に送信する。スピーカー制御部１４１０は、該音声データに基づいた音声（完了音声またはエラー音声等）をスピーカー３５により出力させる。記憶部１４１４に記憶された音声データ（フランス語の音声データ）を削除するようにしてもよく、残存させるようにしてもよい。 The acquisition unit 1402 acquires the voice data requested for the first providing device 150 from the first providing device 150. The acquisition unit 1402 temporarily stores the acquired voice data in the storage unit 1414. Further, the setting unit 1408 transmits the stored voice data (French voice data) to the speaker control unit 1410. The speaker control unit 1410 causes the speaker 35 to output a voice (completion voice, error voice, etc.) based on the voice data. The voice data (French voice data) stored in the storage unit 1414 may be deleted or may be retained.

また、スピーカー３５がフランス語での完了音声を出力するまでに、フランス語での音声データがＭＦＰ１４０に対して送信されない場合がある。例えば、音声データの容量が大きく、スピーカー３５がフランス語での完了音声を出力するときまでに、ＭＦＰ１４０に音声データが届かない場合などである。換言すれば、第１提供装置１５０からの音声データの取得前に、完了音声を出力する場合である。この場合には、スピーカー制御部１４１０は、要求している音声データ（切換処理により切換えられた言語での音声での音声データ）ではなく、デフォルトの言語（切換処理の前の言語であり、本実施形態では日本語）の音声データでの音声を出力する。 Further, the voice data in French may not be transmitted to the MFP 140 by the time the speaker 35 outputs the completed voice in French. For example, the capacity of the voice data is large, and the voice data does not reach the MFP 140 by the time the speaker 35 outputs the completed voice in French. In other words, it is a case where the completion voice is output before the voice data is acquired from the first providing device 150. In this case, the speaker control unit 1410 is not the requested voice data (voice data in the language switched by the switching process) but the default language (the language before the switching process, which is the present. In the embodiment, the voice of the voice data (Japanese) is output.

また、取得部１４０２が取得した言語情報が示す言語の画像データが記憶部１４１４に記憶されていない場合がある。この場合とは、例えば、取得部１４０２は、フランス語を示す言語情報を取得した場合である。取得部１４０２は、フランス語を示す言語情報を取得した場合には、制御部３１は、フランス語に対応する画像データを第２提供装置１６０（図２参照）に要求する。 Further, the image data of the language indicated by the language information acquired by the acquisition unit 1402 may not be stored in the storage unit 1414. In this case, for example, the acquisition unit 1402 acquires linguistic information indicating French. When the acquisition unit 1402 acquires linguistic information indicating French, the control unit 31 requests the second providing device 160 (see FIG. 2) for image data corresponding to French.

第２提供装置１６０は、複数の言語それぞれの画像データを記憶する画像データベースを保持する。第２提供装置１６０は、ＭＦＰ１４０に対して該画像データを提供する。該画像データは、ジョブ設定画像３２０の画像データと、オプション変更画像３２２の画像データとを含む。 The second providing device 160 holds an image database that stores image data for each of the plurality of languages. The second providing device 160 provides the image data to the MFP 140. The image data includes the image data of the job setting image 320 and the image data of the option change image 322.

図８は、画像データベースの一例を示す図である。図８の画像データベースでは、複数の言語である日本語、英語、中国語、フランス語、およびドイツ語それぞれに対して画像データが記憶されている。図８の例では、日本語に対しては日本語用画像データが記憶されており、英語に対しては英語用画像データが記憶されており、中国語に対しては中国語用画像データが記憶されており、フランス語に対してはフランス語用画像データが記憶されており、ドイツ語に対してはドイツ語用画像データが記憶されている。また、他の言語についても画像データが記憶されている。 FIG. 8 is a diagram showing an example of an image database. In the image database of FIG. 8, image data is stored for each of a plurality of languages, Japanese, English, Chinese, French, and German. In the example of FIG. 8, Japanese image data is stored for Japanese, English image data is stored for English, and Chinese image data is stored for Chinese. It is memorized, and the image data for French is stored for French, and the image data for German is stored for German. Image data is also stored in other languages.

制御部３１は、フランス語に対応する画像データを要求するための要求信号を第２提供装置１６０に送信する。要求信号には、言語情報（ここでは、フランス語の言語情報）が含まれている。 The control unit 31 transmits a request signal for requesting image data corresponding to French to the second providing device 160. The request signal contains linguistic information (here, French linguistic information).

第２提供装置１６０は、要求信号を受信すると、該要求信号に含まれる言語情報を抽出し、該抽出された言語情報に対応する画像データを画像データベースから取得する。第２提供装置１６０は、取得した画像データを、要求信号の送信元のＭＦＰ１４０に対して送信する。 Upon receiving the request signal, the second providing device 160 extracts the language information included in the request signal and acquires the image data corresponding to the extracted language information from the image database. The second providing device 160 transmits the acquired image data to the MFP 140, which is the source of the request signal.

取得部１４０２は、第２提供装置１６０に対して要求していた画像データを、第２提供装置１６０から取得する。取得部１４０２は、該取得した画像データを記憶部１４１４に一旦、記憶させる。さらに、設定部１４０８は、該記憶した画像データ（フランス語の画像データ）を表示制御部１４１２に送信する。表示制御部１４１２は、該画像データに基づいた画像（文字）をディスプレイ１０５１に表示させる。 The acquisition unit 1402 acquires the image data requested for the second providing device 160 from the second providing device 160. The acquisition unit 1402 temporarily stores the acquired image data in the storage unit 1414. Further, the setting unit 1408 transmits the stored image data (French image data) to the display control unit 1412. The display control unit 1412 causes the display 1051 to display an image (character) based on the image data.

また、所定の条件が成立した場合に、切換部１４０６は、設定部１４０８により設定されている言語を、切換前の言語に戻す。本実施形態では、所定の条件は、入力音声に基づくジョブが完了するという条件である。つまり、切換部１４０６は、入力音声に基づくジョブが完了した場合には、切換部１４０６は、設定部１４０８により設定されている言語を、切換前の言語に戻す。 Further, when a predetermined condition is satisfied, the switching unit 1406 returns the language set by the setting unit 1408 to the language before switching. In the present embodiment, the predetermined condition is that the job based on the input voice is completed. That is, when the job based on the input voice is completed, the switching unit 1406 returns the language set by the setting unit 1408 to the language before switching.

例えば、切換部１４０６により言語が日本語から英語に切換えられている場合において、実行部１４０４によるジョブが完了した場合には、切換部１４０６は、英語から日本語に戻す。設定言語が英語から日本語に戻されることにより、スピーカー３５は、切換処理の前の言語（つまり、日本語）での音声を出力可能となる。また、設定言語が英語から日本語に戻されることにより、ディスプレイ１０５１は、切換処理の前の言語（つまり、日本語）での画像を表示可能となる。 For example, when the language is switched from Japanese to English by the switching unit 1406 and the job by the execution unit 1404 is completed, the switching unit 1406 returns from English to Japanese. By returning the set language from English to Japanese, the speaker 35 can output voice in the language (that is, Japanese) before the switching process. Further, by returning the set language from English to Japanese, the display 1051 can display an image in the language (that is, Japanese) before the switching process.

［画像形成システムの処理フロー］
図９は、画像形成システム１０００の処理のフローチャートの一例を示す図である。図９の例では、第１情報処理装置６０と、第２情報処理装置１００と、ＭＦＰ１４０との処理を示す。 [Processing flow of image formation system]
FIG. 9 is a diagram showing an example of a flowchart of processing of the image forming system 1000. In the example of FIG. 9, the processing of the first information processing device 60, the second information processing device 100, and the MFP 140 is shown.

まず、ステップＳ２において、第１情報処理装置６０は、音声受付開始処理を実行する。音声受付開始処理は、例えば、スマートスピーカー２０に「音声入力してください」等といった音声を出力させる処理である。第１情報処理装置６０は、予め設定されたデフォルトの言語（本実施形態では、日本語）で、この音声を出力させる。その後、ステップＳ３において、スマートスピーカー２０からの入力音声を、第１情報処理装置６０が取得すると、第１情報処理装置６０は、セッションを開始する（セッションを接続させる）。セッションは、第１ネットワーク４０、第１情報処理装置６０、第２ネットワーク８０，第２情報処理装置１００，および第３ネットワーク１２０を介してのスマートスピーカー２０とＭＦＰ１４０とセッションである。 First, in step S2, the first information processing device 60 executes the voice reception start process. The voice reception start process is, for example, a process of causing the smart speaker 20 to output a voice such as "Please input voice". The first information processing device 60 outputs this voice in a preset default language (Japanese in this embodiment). After that, in step S3, when the first information processing device 60 acquires the input voice from the smart speaker 20, the first information processing device 60 starts a session (connects the sessions). The session is a session with the smart speaker 20 and the MFP 140 via the first network 40, the first information processing device 60, the second network 80, the second information processing device 100, and the third network 120.

その後、ユーザーＵからの入力音声（例えば、図１（Ａ）参照）の入力音声データを第１ネットワーク４０経由で、第１情報処理装置６０が取得すると、ステップＳ４において、検出部６０２は、入力音声に対して音声解析等を実行することにより、入力音声の言語を検出する。次に、ステップＳ６において、検出部６０２は、入力音声に対して音声解析等を実行することにより、ジョブ内容を検出する。 After that, when the first information processing apparatus 60 acquires the input voice data of the input voice from the user U (for example, see FIG. 1 (A)) via the first network 40, the detection unit 602 inputs in step S4. The language of the input voice is detected by performing voice analysis or the like on the voice. Next, in step S6, the detection unit 602 detects the job content by executing voice analysis or the like on the input voice.

次に、第１情報処理装置６０の変換部６０４は、言語情報から応答音声データに変換する。ステップＳ７において、第１情報処理装置６０は、応答音声データを第１ネットワーク４０経由でスマートスピーカー２０に対して送信することにより、スマートスピーカー２０から該応答音声データに基づく応答音声を出力させる。この応答音声は、検出部６０２が検出した言語の音声である。 Next, the conversion unit 604 of the first information processing device 60 converts the linguistic information into the response voice data. In step S7, the first information processing device 60 transmits the response voice data to the smart speaker 20 via the first network 40, so that the smart speaker 20 outputs the response voice based on the response voice data. This response voice is the voice of the language detected by the detection unit 602.

次に、ステップＳ８において、検出部６０２がジョブ内容を検出した場合には（入力音声にジョブ内容が含まれている場合には）、送受信部６０８は、言語情報と、該ジョブ内容を示すジョブ情報とを第２ネットワーク８０経由で、第２情報処理装置１００に対して送信する。ステップＳ８において、検出部６０２がジョブ内容を検出しなかった場合（例えば、入力音声にジョブの内容が含まれていない場合）には、ステップＳ８の処理を実行せずに、処理はステップＳ１０に進む。 Next, in step S8, when the detection unit 602 detects the job content (when the input voice includes the job content), the transmission / reception unit 608 sets the language information and the job indicating the job content. Information is transmitted to the second information processing apparatus 100 via the second network 80. If the detection unit 602 does not detect the job content in step S8 (for example, when the input voice does not include the job content), the process proceeds to step S10 without executing the process of step S8. move on.

次に、ステップＳ１０において、検出部６０２がオプション内容を検出した場合には（入力音声にオプション内容が含まれている場合には）、送受信部６０８は、オプション内容を示すオプション情報を第２ネットワーク８０経由で、第２情報処理装置１００に対して送信する。ステップＳ１０において、検出部６０２がオプション内容を検出しなかった場合（例えば、入力音声にオプション内容が含まれていない場合）には、ステップＳ１０の処理を実行せずに、処理はステップＳ１２に進む。 Next, in step S10, when the detection unit 602 detects the option content (when the input voice includes the option content), the transmission / reception unit 608 provides the option information indicating the option content to the second network. It is transmitted to the second information processing apparatus 100 via 80. If the detection unit 602 does not detect the option content in step S10 (for example, when the input voice does not include the option content), the process proceeds to step S12 without executing the process of step S10. ..

次に、ステップＳ１２において、検出部６０２がジョブ実行指示を検出した場合には（入力音声にジョブ実行指示が含まれている場合には）、送受信部６０８は、ジョブ実行指示を第２ネットワーク８０経由で、第２情報処理装置１００に対して送信する。ステップＳ１２において、検出部６０２がジョブ実行指示を検出しなかった場合（例えば、入力音声にジョブ実行指示が含まれていない場合）には、ステップＳ１０の処理を実行せずに、処理はステップＳ１３に進む。 Next, in step S12, when the detection unit 602 detects the job execution instruction (when the input voice includes the job execution instruction), the transmission / reception unit 608 issues the job execution instruction to the second network 80. It is transmitted to the second information processing apparatus 100 via the system. If the detection unit 602 does not detect the job execution instruction in step S12 (for example, when the input voice does not include the job execution instruction), the process is performed in step S13 without executing the process of step S10. Proceed to.

ステップＳ１２の処理が終了すると、ステップＳ１３において、第１情報処理装置６０は、セッションを終了させる。このセッションは、ステップＳ３で開始されたセッションである。これにより、スマートスピーカー２０とＭＦＰ１４０とセッションが終了する。その後、ステップＳ１４において、第１情報処理装置６０は、音声再入力依頼をスマートスピーカー２０に対して実行する。音声再入力依頼は、スマートスピーカー２０に「音声入力してください」等といった音声を再び出力させる処理である。音声再入力依頼に係る音声は、上述の予め設定されたデフォルトの言語での音声としてもよく、ステップＳ４で検出された言語での音声としてもよい。 When the process of step S12 is completed, the first information processing apparatus 60 ends the session in step S13. This session is the session started in step S3. As a result, the session with the smart speaker 20 and the MFP 140 ends. After that, in step S14, the first information processing device 60 executes the voice re-input request to the smart speaker 20. The voice re-input request is a process of causing the smart speaker 20 to output a voice such as "Please input voice" again. The voice related to the voice re-input request may be the voice in the above-mentioned preset default language, or may be the voice in the language detected in step S4.

次に、第２情報処理装置１００の処理を説明する。ステップＳ８において、第１情報処理装置６０から、言語情報とジョブ情報とが送信されると、ステップＳ３２において、第２情報処理装置１００は、送信された言語情報とジョブ情報とをＭＦＰ１４０に対して送信する。 Next, the processing of the second information processing apparatus 100 will be described. When the language information and the job information are transmitted from the first information processing device 60 in step S8, the second information processing device 100 transmits the transmitted language information and the job information to the MFP 140 in step S32. Send.

また、ステップＳ１０において、第１情報処理装置６０から、オプション情報が送信されると、ステップＳ３４において、第２情報処理装置１００は、送信されたオプション情報をＭＦＰ１４０に対して送信する。 Further, in step S10, when the option information is transmitted from the first information processing device 60, the second information processing device 100 transmits the transmitted option information to the MFP 140 in step S34.

また、ステップＳ１２において、第１情報処理装置６０から、ジョブ実行指示が送信されると、ステップＳ３６において、第２情報処理装置１００は、送信されたジョブ実行指示をＭＦＰ１４０に対して送信する。 Further, in step S12, when the job execution instruction is transmitted from the first information processing device 60, the second information processing device 100 transmits the transmitted job execution instruction to the MFP 140 in step S36.

次に、ＭＦＰ１４０の処理を説明する。ステップＳ３２において、第２情報処理装置１００から言語情報とジョブ情報とが送信されると、ステップＳ６０において、ＭＦＰ１４０は、第１処理を実行する。ステップＳ３４において、第２情報処理装置１００からオプション情報が送信されると、ステップＳ６２において、ＭＦＰ１４０は、第２処理を実行する。ステップＳ３６において、第２情報処理装置１００からジョブ実行指示が送信されると、ステップＳ６４において、ＭＦＰ１４０は、第３処理を実行する。 Next, the processing of the MFP 140 will be described. When the language information and the job information are transmitted from the second information processing apparatus 100 in step S32, the MFP 140 executes the first process in step S60. When the option information is transmitted from the second information processing apparatus 100 in step S34, the MFP 140 executes the second process in step S62. When the job execution instruction is transmitted from the second information processing apparatus 100 in step S36, the MFP 140 executes the third process in step S64.

図１０は、第１処理のフローチャートの一例である。ステップＳ６０１において、制御部３１は、ステップＳ３２において送信されたジョブ情報に含まれるジョブ内容を所定の記憶領域（例えば、記憶部１４１４）に記憶させる。 FIG. 10 is an example of a flowchart of the first process. In step S601, the control unit 31 stores the job content included in the job information transmitted in step S32 in a predetermined storage area (for example, storage unit 1414).

次に、ステップＳ６０２において、切換部１４０６は、ステップＳ３２で送信された言語情報が示す言語と、デフォルト言語とが異なるか否かを判断する。ステップＳ６０２において、切換部１４０６は、ステップＳ３２で送信された言語情報が示す言語と、デフォルト言語とが異なると判断した場合には（ステップＳ６０２でＹＥＳ）、処理はステップＳ６０３に進む。ステップＳ６０３においては、切換部１４０６は、デフォルト言語（例えば、日本語）から、言語情報が示す言語（例えば、英語）に切換える。また、切換部１４０６は、ステップＳ３２で送信された言語情報が示す言語と、デフォルト言語とが同一であると判断した場合（ステップＳ６０２でＮＯ）には、処理は、ステップＳ６０４に進む。また、ステップＳ６０３の処理が終了した場合にも、処理は、ステップＳ６０４に進む。 Next, in step S602, the switching unit 1406 determines whether or not the language indicated by the language information transmitted in step S32 is different from the default language. If the switching unit 1406 determines in step S602 that the language indicated by the language information transmitted in step S32 is different from the default language (YES in step S602), the process proceeds to step S603. In step S603, the switching unit 1406 switches from the default language (for example, Japanese) to the language indicated by the language information (for example, English). If the switching unit 1406 determines that the language indicated by the language information transmitted in step S32 and the default language are the same (NO in step S602), the process proceeds to step S604. Further, even when the process of step S603 is completed, the process proceeds to step S604.

ステップＳ６０４において、設定部１４０８は、切換後の言語の音声データが記憶部１４１４に記憶されているか否かを判断する。設定部１４０８が、切換後の言語の音声データが記憶部１４１４に記憶されていないと判断した場合には（ステップＳ６０４でＮＯ）、ステップＳ６０６において、制御部３１は、記憶部１４１４に記憶されていないと判断された音声データを、第１提供装置１５０に対して要求する。次に、ステップＳ６０７において、制御部３１は、要求済フラグを所定の記憶領域（例えば、記憶部１４１４）に格納させる。要求済フラグは、記憶部１４１４に記憶されていないと判断された音声データを、第１提供装置１５０に対して要求したことを示すフラグである。 In step S604, the setting unit 1408 determines whether or not the voice data of the language after switching is stored in the storage unit 1414. When the setting unit 1408 determines that the voice data of the language after switching is not stored in the storage unit 1414 (NO in step S604), in step S606, the control unit 31 is stored in the storage unit 1414. The voice data determined not to be present is requested to the first providing device 150. Next, in step S607, the control unit 31 stores the requested flag in a predetermined storage area (for example, the storage unit 1414). The requested flag is a flag indicating that the first providing device 150 has requested the voice data determined not to be stored in the storage unit 1414.

ステップＳ６０４でＹＥＳと判断された場合、およびステップＳ６０７の処理が終了した場合には、処理はステップＳ６０８に進む。ステップＳ６０８において、設定部１４０８は、切換後の言語の画像データが記憶部１４１４に記憶されているか否かを判断する。設定部１４０８が、切換後の言語の画像データが記憶部１４１４に記憶されていると判断した場合には（ステップＳ６０８でＹＥＳ）、処理は、ステップＳ６１０に進む。ステップＳ６１０において、表示制御部１４１２は、切換後の言語でジョブ設定画像３２０（ステップＳ６０８で記憶部１４１４に記憶されていると判断された画像データに基づく画像）を表示する。 If YES is determined in step S604, and if the process of step S607 is completed, the process proceeds to step S608. In step S608, the setting unit 1408 determines whether or not the image data of the language after switching is stored in the storage unit 1414. If the setting unit 1408 determines that the image data of the language after switching is stored in the storage unit 1414 (YES in step S608), the process proceeds to step S610. In step S610, the display control unit 1412 displays the job setting image 320 (an image based on the image data determined to be stored in the storage unit 1414 in step S608) in the language after switching.

また、設定部１４０８が、切換後の言語の画像データが記憶部１４１４に記憶されていないと判断した場合には（ステップＳ６０８でＮＯ）、処理は、ステップＳ６１２に進む。ステップＳ６１２において、制御部３１は、記憶部１４１４に記憶されていないと判断された音声データを、第２提供装置１６０に対して要求する。また、ステップＳ６１２の括弧書きに示すように、取得部１４０２が、第２提供装置１６０からの画像データを取得したときに、該取得した画像データでジョブ設定画像を表示する。ステップＳ６１０またはステップＳ６１２の処理が終了すると、第１処理は終了する。 If the setting unit 1408 determines that the image data of the language after switching is not stored in the storage unit 1414 (NO in step S608), the process proceeds to step S612. In step S612, the control unit 31 requests the second providing device 160 for voice data determined not to be stored in the storage unit 1414. Further, as shown in parentheses in step S612, when the acquisition unit 1402 acquires the image data from the second providing device 160, the job setting image is displayed with the acquired image data. When the process of step S610 or step S612 is completed, the first process is completed.

図１１は、第２処理のフローチャートの一例である。ステップＳ６２２において、制御部３１は、ステップＳ３４で第２情報処理装置１００から送信されたオプション情報に基づいてオプションを変更して、該変更されたオプションを設定する。次に、ステップＳ６２４において、表示制御部１４１２は、変更されたオプション変更画像３２２を表示する。ステップＳ６２４の処理が終了すると、第２処理を終了する。 FIG. 11 is an example of a flowchart of the second process. In step S622, the control unit 31 changes the options based on the option information transmitted from the second information processing apparatus 100 in step S34, and sets the changed options. Next, in step S624, the display control unit 1412 displays the changed option change image 322. When the process of step S624 is completed, the second process is completed.

図１２は、第３処理のフローチャートの一例である。ステップＳ６４２において、実行部１４０４は、処理ユニット１１に、ステップＳ６０１で記憶されたジョブ内容のジョブを実行させる。このジョブ内容は、図１０のステップＳ６０１で記憶されたジョブ内容である。次に、ステップＳ６４４において、制御部３１は、ジョブが正常に終了したか否かを判断する。 FIG. 12 is an example of a flowchart of the third process. In step S642, the execution unit 1404 causes the processing unit 11 to execute the job of the job contents stored in step S601. This job content is the job content stored in step S601 of FIG. Next, in step S644, the control unit 31 determines whether or not the job has been completed normally.

制御部３１は、ジョブが正常に終了したと判断した場合には（ステップＳ６４４でＹＥＳ）、制御部３１は、要求済フラグが格納されているか否かを判断する。要求済フラグは、図１０のステップＳ６０７で格納されるフラグである。制御部３１が、要求済フラグが格納されていると判断した場合には（ステップＳ６４６でＹＥＳ）、処理はステップＳ６４８に進む。ステップＳ６４６で要求済フラグが格納されていると判断した場合というのは、図１０のステップＳ６０６において、ＭＦＰ１４０が、第１提供装置１５０に対して音声データを要求した場合ということである。 When the control unit 31 determines that the job has ended normally (YES in step S644), the control unit 31 determines whether or not the requested flag is stored. The requested flag is a flag stored in step S607 of FIG. If the control unit 31 determines that the requested flag is stored (YES in step S646), the process proceeds to step S648. The case where it is determined in step S646 that the requested flag is stored is the case where the MFP 140 requests voice data from the first providing device 150 in step S606 of FIG.

ステップＳ６４８において、制御部３１は、第１提供装置１５０に対して要求していた音声データを、第１提供装置１５０から取得したか否かを判断する。ステップＳ６４８でＮＯと判断された場合、つまり、完了音声を出力するタイミングであるにも関わらず、該完了音声の音声データをＭＦＰ１４０が取得していない場合には、処理は、ステップＳ６５２に進む。 In step S648, the control unit 31 determines whether or not the voice data requested for the first providing device 150 has been acquired from the first providing device 150. If NO is determined in step S648, that is, if the MFP 140 does not acquire the voice data of the completed voice even though it is the timing to output the completed voice, the process proceeds to step S652.

ステップＳ６５２において、スピーカー制御部１４１０は、切換前の言語で（つまり、デフォルトの言語で）、ジョブ完了音声を出力する。次に、ステップＳ６６２において、切換部１４０６は、設定言語を切換えている場合には、切換前の設定言語に戻す。切換前の設定言語に戻すことにより、ディスプレイ１０５１の画像は、切換前の言語で表示される。 In step S652, the speaker control unit 1410 outputs the job completion voice in the language before switching (that is, in the default language). Next, in step S662, when the setting language is switched, the switching unit 1406 returns to the setting language before the switching. By returning to the set language before switching, the image on the display 1051 is displayed in the language before switching.

次に、ステップＳ６６４において、制御部３１は、要求済フラグが格納されている場合には，該要求済フラグを消去する。その後、第３処理は終了する。 Next, in step S664, if the requested flag is stored, the control unit 31 deletes the requested flag. After that, the third process ends.

また、ステップＳ６４６でＮＯと判断された場合、およびステップＳ６４８でＹＥＳと判断された場合には、処理はステップＳ６５０に進む。ステップＳ６５０において、スピーカー制御部１４１０は、切換後の言語でジョブ完了音声を出力する。なお、ステップＳ６４６でＮＯと判断された場合でのステップＳ６５０では、記憶部１４１４に記憶されている音声データ（切換後の言語の音声データ）に基づいて、ジョブ完了音声を出力する。また、ステップＳ６４８でＹＥＳと判断された場合でのステップＳ６５０では、第１提供装置１５０から取得した音声データ（切換後の言語の音声データ）に基づいて、ジョブ完了音声を出力する。 If NO is determined in step S646 and YES is determined in step S648, the process proceeds to step S650. In step S650, the speaker control unit 1410 outputs the job completion voice in the language after switching. In step S650 when NO is determined in step S646, the job completion voice is output based on the voice data (voice data of the language after switching) stored in the storage unit 1414. Further, in step S650 when YES is determined in step S648, the job completion voice is output based on the voice data (voice data of the language after switching) acquired from the first providing device 150.

また、ステップＳ６４４において、制御部３１が、ジョブが正常に終了しなかったと判断した場合、つまり、ジョブ実行中にエラーが発生したと判断した場合には（ステップＳ６４４でＮＯ）、処理はステップＳ６６０に進む。ステップＳ６６０においては、スピーカー制御部１４１０は、デフォルトの言語（切換前の言語）で、該発生したエラーに対応するエラー音声を出力する。その後、処理は、ステップＳ６６２に進む。 Further, in step S644, if the control unit 31 determines that the job has not ended normally, that is, if it determines that an error has occurred during job execution (NO in step S644), the process is step S660. Proceed to. In step S660, the speaker control unit 1410 outputs an error voice corresponding to the generated error in the default language (language before switching). After that, the process proceeds to step S662.

［小括］
（１）本実施形態の画像形成システム１０００は、スマートスピーカー２０と、ＭＦＰ１４０とを備える。スマートスピーカー２０は、ユーザーからの入力音声が入力されるとともに、入力音声が入力されたときに該入力音声に対応する応答音声を出力する。また、スマートスピーカー２０は、第１情報処理装置６０に対して入力音声を送信する。 [Brief Summary]
(1) The image forming system 1000 of the present embodiment includes a smart speaker 20 and an MFP 140. The smart speaker 20 inputs the input voice from the user, and when the input voice is input, outputs the response voice corresponding to the input voice. Further, the smart speaker 20 transmits the input voice to the first information processing device 60.

第１情報処理装置６０は、スマートスピーカー２０からの入力音声を受信したときに、言語情報（入力音声の言語、および応答音声の言語を示す情報）をＭＦＰ１４０に対して送信する。取得部１４０２は、第１情報処理装置６０から送信された言語情報を取得する。切換部１４０６は、設定部１４０８により設定されている言語（デフォルトの言語）と、取得した言語情報により示される言語とが異なる場合には（ステップＳ６０２でＹＥＳ）、設定部により設定されている言語から、取得した言語情報に示される言語に切換える。スピーカー３５は、切換部１４０６により切換えられた言語での音声を出力する。 When the first information processing device 60 receives the input voice from the smart speaker 20, it transmits language information (information indicating the language of the input voice and the language of the response voice) to the MFP 140. The acquisition unit 1402 acquires the language information transmitted from the first information processing device 60. In the switching unit 1406, when the language set by the setting unit 1408 (default language) and the language indicated by the acquired language information are different (YES in step S602), the language set by the setting unit To switch to the language shown in the acquired language information. The speaker 35 outputs the sound in the language switched by the switching unit 1406.

これにより、図１（Ｄ）にも示すように、ユーザーＵがＭＦＰ１４０の設定言語を変更することなく、スマートスピーカー２０からの応答音声の言語と、スマートスピーカー２０から出力される完了音声の言語とを同一にすることができる。したがって、ユーザーによる設定言語の変更を必要とすることなく、音声の言語に関する違和感をユーザーに感じさせることを防止することができる。 As a result, as shown in FIG. 1D, the language of the response voice from the smart speaker 20 and the language of the completed voice output from the smart speaker 20 can be obtained without the user U changing the setting language of the MFP 140. Can be the same. Therefore, it is possible to prevent the user from feeling a sense of discomfort regarding the spoken language without requiring the user to change the set language.

（２）また、図４に示すように、第１情報処理装置６０の検出部６０２は、スマートスピーカー２０に入力された入力音声の言語を検出する。また、第１情報処理装置６０の送受信部６０８は、検出された言語の言語情報をＭＦＰ１４０に対して送信する。ＭＦＰ１４０の取得部１４０２は、送信された言語情報を取得する。切換部１４０６は、設定言語から、該取得した言語情報により示される言語に切換える。したがって、入力音声の言語が如何なる言語であっても、切換部１４０６は、設定言語を、入力音声の言語に切換えることができる。 (2) Further, as shown in FIG. 4, the detection unit 602 of the first information processing device 60 detects the language of the input voice input to the smart speaker 20. Further, the transmission / reception unit 608 of the first information processing device 60 transmits the language information of the detected language to the MFP 140. The acquisition unit 1402 of the MFP 140 acquires the transmitted language information. The switching unit 1406 switches from the set language to the language indicated by the acquired language information. Therefore, regardless of the language of the input voice, the switching unit 1406 can switch the set language to the language of the input voice.

（３）また、画像形成システム１０００は、音声データを提供する第１提供装置１５０をさらに備える。ＭＦＰ１４０は、切換部１４０６により切換えられる言語に対応する音声データが記憶部１４１４に記憶されていない場合には（図１０のステップＳ６０４でＮＯ）、該切換えられる言語に対応する音声データを第１提供装置１５０に対して要求する。取得部１４０２は、該第１提供装置１５０から該音声データを取得する。スピーカー３５は、該取得された音声データに基づいてジョブ完了音声を出力する。したがって、切換えられる言語の音声の音声データをＭＦＰ１４０が記憶していない場合であっても、第１提供装置１５０から取得することができる。よって、ＭＦＰ１４０は、多数の言語の音声データを記憶する必要がなく、ＭＦＰ１４０の記憶容量を低減させることができる。 (3) Further, the image forming system 1000 further includes a first providing device 150 for providing audio data. When the voice data corresponding to the language switched by the switching unit 1406 is not stored in the storage unit 1414 (NO in step S604 of FIG. 10), the MFP 140 first provides the voice data corresponding to the switched language. Request for device 150. The acquisition unit 1402 acquires the voice data from the first providing device 150. The speaker 35 outputs the job completion voice based on the acquired voice data. Therefore, even when the MFP 140 does not store the audio data of the audio of the language to be switched, it can be acquired from the first providing device 150. Therefore, the MFP 140 does not need to store audio data in a large number of languages, and the storage capacity of the MFP 140 can be reduced.

（４）また、第１提供装置１５０からの音声データの取得前に音声を出力する場合、つまり、完了音声を出力するときであっても第１提供装置１５０から音声データがＭＦＰ１４０に到達していない場合（図１２のステップＳ６４８でＮＯの場合）には、切換部１４０６により切換えられた言語での音声ではなく、切換前の言語（つまり、デフォルト言語）での音声を出力する。したがって、完了音声を出力するときに、第１提供装置１５０からの音声データがＭＦＰ１４０に到達しない場合であっても、ＭＦＰ１４０は、柔軟に対応することができる。 (4) Further, when the voice is output before the voice data is acquired from the first providing device 150, that is, even when the completed voice is output, the voice data has reached the MFP 140 from the first providing device 150. If not (NO in step S648 of FIG. 12), the voice in the language before switching (that is, the default language) is output instead of the voice in the language switched by the switching unit 1406. Therefore, even if the voice data from the first providing device 150 does not reach the MFP 140 when the completion voice is output, the MFP 140 can flexibly deal with it.

（５）また、表示制御部１４１２は、切換えられた言語での文字の画像（図６に示すジョブ設定画像３２０およびオプション変更画像３２２など）をディスプレイ１０５１に表示させることができる。したがって、ユーザーによる言語変更操作を必要とすることなく、切換えられた言語での文字の画像を表示できる。よって、ユーザーの利便性を向上させることができる。 (5) Further, the display control unit 1412 can display an image of characters in the switched language (such as the job setting image 320 and the option change image 322 shown in FIG. 6) on the display 1051. Therefore, it is possible to display an image of characters in the switched language without requiring a language change operation by the user. Therefore, the convenience of the user can be improved.

（６）また、画像形成システム１０００は、画像データを提供する第２提供装置１６０をさらに備える。切換処理により切換えられる言語に対応する画像データが記憶部１４１４に記憶されていない場合には、ＭＦＰ１４０は、切換部１４０６により切換えられる言語に対応する画像データを第２提供装置１６０に対して要求する。取得部１４０２は、該第２提供装置１６０から該画像データを取得する。表示制御部１４１２は、該取得された画像データに基づいた画像をディスプレイ１０５１に表示させる。したがって、切換えられる言語の画像の画像データをＭＦＰ１４０が記憶していない場合であっても、第２提供装置１６０から取得することができる。よって、ＭＦＰ１４０は、多数の言語の画像データを記憶する必要がなく、ＭＦＰ１４０の記憶領域を低減させることができる。 (6) Further, the image forming system 1000 further includes a second providing device 160 for providing image data. When the image data corresponding to the language switched by the switching process is not stored in the storage unit 1414, the MFP 140 requests the second providing device 160 for the image data corresponding to the language switched by the switching unit 1406. .. The acquisition unit 1402 acquires the image data from the second providing device 160. The display control unit 1412 causes the display 1051 to display an image based on the acquired image data. Therefore, even when the MFP 140 does not store the image data of the image of the language to be switched, it can be acquired from the second providing device 160. Therefore, the MFP 140 does not need to store image data in a large number of languages, and the storage area of the MFP 140 can be reduced.

（７）また、切換部１４０６は、スピーカー３５が完了音声（スマートスピーカー２０から送信された入力音声に基づいたジョブが完了したことを示す音声）を出力した場合に、切換える前の言語（デフォルトの言語）に戻す。これにより、スピーカー３５は、切換前の言語での音声を出力可能となる。よって、切換前の言語での音声が出力される状態に長々と制御されることを防止でき、結果として、ユーザーの利便性を向上させることができる。 (7) Further, when the speaker 35 outputs the completion voice (voice indicating that the job is completed based on the input voice transmitted from the smart speaker 20), the switching unit 1406 outputs the language before switching (default). Return to language). As a result, the speaker 35 can output sound in the language before switching. Therefore, it is possible to prevent the voice from being output in the language before switching for a long time, and as a result, the convenience of the user can be improved.

（８）また、例えば、以下のような状況の場合を説明する。英語を理解できるが日本語を理解できないユーザーが、日本に設置されているＭＦＰ１４０を使用するとする。また、ＭＦＰ１４０のデフォルト言語は、日本語であるとする。また、ＭＦＰ１４０の近辺には、ＭＦＰ１４０を管理する管理者が存在するとする。また、この管理者は、日本語は理解できるが英語を理解できないとする。 (8) Further, for example, the case of the following situation will be described. It is assumed that a user who can understand English but cannot understand Japanese uses the MFP 140 installed in Japan. Further, it is assumed that the default language of the MFP 140 is Japanese. Further, it is assumed that there is an administrator who manages the MFP 140 in the vicinity of the MFP 140. The administrator also understands Japanese but not English.

このような場合において、該ユーザー（英語を理解できるが日本語を理解できないユーザー）が、仮に、「Please make two color copies」という入力音声を発したとする。そして、ＭＦＰ１４０による「２枚のカラーコピー」に係るジョブについてトナー切れのエラーが発生したとする。この場合において、ＭＦＰ１４０が、エラーが発生したことを示す音声（例えば、「エラーが発生しました」という音声）、およびエラーの内容を示す音声（例えば、「トナー切れです」という音声）を、切換処理の後の言語（つまり、英語）で出力すると、英語を理解できない管理者は、エラーが発生していること、およびエラーの内容を理解できない。 In such a case, it is assumed that the user (a user who can understand English but cannot understand Japanese) emits an input voice "Please make two color copies". Then, it is assumed that a toner out error occurs in the job related to "two color copies" by the MFP 140. In this case, the MFP 140 switches between a voice indicating that an error has occurred (for example, a voice saying "an error has occurred") and a voice indicating the content of the error (for example, a voice saying "out of toner"). When output in the language after processing (that is, English), the administrator who does not understand English cannot understand that the error has occurred and the content of the error.

そこで、エラーが発生した場合には、スピーカー３５は、エラーが発生したことを示す音声（例えば、「エラーが発生しました」という音声）、およびエラーの内容を示す音声（例えば、「トナー切れです」という音声）を切換処理の前の言語（デフォルト言語）で出力する。これにより、日本語は理解できるが英語を理解できない管理者に対して、エラーが発生したこと、およびエラーの内容を理解させることができる。したがって、管理者によるＭＦＰ１４０の管理をスムーズに行わせることができる。 Therefore, when an error occurs, the speaker 35 has a voice indicating that the error has occurred (for example, a voice saying "an error has occurred") and a voice indicating the content of the error (for example, "out of toner"). Is output in the language (default language) before the switching process. This makes it possible for an administrator who can understand Japanese but not English to understand that an error has occurred and the content of the error. Therefore, the administrator can smoothly manage the MFP 140.

（９）また、ＭＦＰ１４０は、完了音声を切換部１４０６により切換えられた言語で出力する。したがって、ユーザーによる入力音声、スマートスピーカー２０による応答音声、およびＭＦＰ１４０による完了音声を全て同一の言語とすることができる。よって、ＭＦＰ１４０が、入力音声に基づくジョブを完了させた場合において、音声の言語に関する違和感をユーザーに感じさせることを防止することができる。 (9) Further, the MFP 140 outputs the completed voice in the language switched by the switching unit 1406. Therefore, the input voice by the user, the response voice by the smart speaker 20, and the completed voice by the MFP 140 can all be in the same language. Therefore, when the MFP 140 completes the job based on the input voice, it is possible to prevent the user from feeling a sense of discomfort regarding the language of the voice.

［その他の実施形態］
（１）前述の実施形態においては、取得部１４０２は、第１情報処理装置６０の検出部６０２が検出した言語情報を取得するとして説明した。しかしながら、以下のような構成であってもよい。スマートスピーカー２０は予め言語が設定されている構成としてもよい。この場合には、ユーザーからの入力音声の言語の種別に関わらず、スマートスピーカー２０は、設定されている言語での応答音声を出力する。また、取得部１４０２は、スマートスピーカー２０から、該スマートスピーカー２０に設定されている言語の言語情報を取得する。この言語情報は、応答音声（第２音声）の言語を特定可能な言語情報である。このような構成によれば、第１情報処理装置６０の入力音声に対する解析処理の負担を低減できるとともに、スマートスピーカー２０からの応答音声の言語と、ＭＦＰ１４０からの完了音声の言語とを同一にすることができる。 [Other Embodiments]
(1) In the above-described embodiment, the acquisition unit 1402 has been described as acquiring the language information detected by the detection unit 602 of the first information processing apparatus 60. However, the configuration may be as follows. The smart speaker 20 may have a configuration in which a language is set in advance. In this case, the smart speaker 20 outputs the response voice in the set language regardless of the type of the language of the input voice from the user. Further, the acquisition unit 1402 acquires the language information of the language set in the smart speaker 20 from the smart speaker 20. This linguistic information is linguistic information that can specify the language of the response voice (second voice). According to such a configuration, the load of the analysis processing for the input voice of the first information processing device 60 can be reduced, and the language of the response voice from the smart speaker 20 and the language of the completed voice from the MFP 140 are made the same. be able to.

（２）前述の実施形態では、図１２のステップＳ６６２等にも示したように、ディスプレイ１０５１が表示する画像の言語が、切換前の言語に戻す所定条件は、完了音声またはエラー音声が出力されるという条件であるとして説明した。しかしながら、この所定条件は他の条件としてもよい。該他の条件は、例えば、スマートスピーカー２０とＭＦＰ１４０との接続（セッション）が切れるという条件とする構成を採用してもよい。本実施形態では、スマートスピーカー２０とＭＦＰ１４０との接続（セッション）が切れる契機は、ステップＳ８およびステップＳ１０の処理が終了した場合である。ステップＳ８およびステップＳ１０の処理が終了しているということは、ＭＦＰ１４０によるステップＳ６０の第１処理、およびＭＦＰ１４０によるステップＳ６２の第２処理が終了しているということである。第１処理および第２処理が実行されるということは、ジョブ設定画像３２０と、オプション変更画像３２２とが切換処理後の設定言語で表示されているということである。ジョブ設定画像３２０と、オプション変更画像３２２とが切換処理後の設定言語で表示されていることから、切換処理後の設定言語でのジョブ設定画像３２０と、オプション変更画像３２２とを、ユーザーは視認できる。したがって、ユーザーが、切換処理後の設定言語でのジョブ設定画像３２０と、オプション変更画像３２２とを視認した後においては、ユーザーにとって、ジョブ設定画像３２０と、オプション変更画像３２２とはあまり重要ではない。そこで、本変形例では、所定条件は、例えば、スマートスピーカー２０とＭＦＰ１４０との接続（セッション）が終了という条件とする構成を採用する。 (2) In the above-described embodiment, as shown in step S662 of FIG. 12, a completion voice or an error voice is output as a predetermined condition for returning the language of the image displayed on the display 1051 to the language before switching. It was explained as a condition that However, this predetermined condition may be another condition. As the other condition, for example, a configuration may be adopted in which the connection (session) between the smart speaker 20 and the MFP 140 is disconnected. In the present embodiment, the opportunity for disconnecting the connection (session) between the smart speaker 20 and the MFP 140 is when the processes of steps S8 and S10 are completed. The fact that the processes of steps S8 and S10 are completed means that the first process of step S60 by the MFP 140 and the second process of step S62 by the MFP 140 are completed. The fact that the first process and the second process are executed means that the job setting image 320 and the option change image 322 are displayed in the setting language after the switching process. Since the job setting image 320 and the option change image 322 are displayed in the setting language after the switching process, the user can visually recognize the job setting image 320 and the option change image 322 in the setting language after the switching process. it can. Therefore, after the user visually recognizes the job setting image 320 and the option change image 322 in the setting language after the switching process, the job setting image 320 and the option change image 322 are not so important to the user. .. Therefore, in this modification, the predetermined condition is, for example, a configuration in which the connection (session) between the smart speaker 20 and the MFP 140 is terminated.

このような構成であっても、切換前の言語での画像が表示されるという状態に長々と制御されることを防止でき、結果として、ユーザーの利便性を向上させることができる。 Even with such a configuration, it is possible to prevent the image from being displayed in the language before switching for a long time, and as a result, the convenience of the user can be improved.

次に、ＭＦＰ１４０が、セッションが終了したことを認識する手法を説明する。例えば、セッションが開始すると、ＭＦＰ１４０は、所定期間（例えば、１秒）毎に、予め定められた所定信号を第１情報処理装置６０に対して送信する。該第１情報処理装置６０は、該所定信号を受信すると、返答信号をＭＦＰ１４０に送り返す構成が採用されている。ＭＦＰ１４０は、該返答信号を受信することにより、セッションが継続されていることを認識する。また、ＭＦＰ１４０は、所定信号を第１情報処理装置６０に対して送信したものの、返答信号を受信しなかった場合には、セッションが終了したと判断する。 Next, a method for the MFP 140 to recognize that the session has ended will be described. For example, when a session starts, the MFP 140 transmits a predetermined predetermined signal to the first information processing device 60 at predetermined period (for example, 1 second). When the first information processing apparatus 60 receives the predetermined signal, the first information processing apparatus 60 has adopted a configuration in which a response signal is sent back to the MFP 140. Upon receiving the response signal, the MFP 140 recognizes that the session is continuing. Further, when the MFP 140 transmits a predetermined signal to the first information processing device 60 but does not receive a response signal, it determines that the session has ended.

また、本実施形態では、セッションが終了する条件は、図９で示したように、ステップＳ１２が終了するという条件であるとして説明した。しかしながら、セッションが終了する条件は他の条件としてもよい。セッションが終了する条件は、例えば、セッションが開始されたときから、所定時間（例えば、５秒間）経過するという条件であってもよい。 Further, in the present embodiment, the condition for ending the session is described as the condition for ending step S12 as shown in FIG. However, the condition for ending the session may be another condition. The condition for ending the session may be, for example, a condition that a predetermined time (for example, 5 seconds) elapses from the time when the session is started.

（３）また、ＭＦＰ１４０は、切換処理により切換えられる言語での画像を表示する制御を禁止する禁止部１４１６を備えるようにしてもよい。禁止部１４１６は、図５の破線部分で示される通りである。このような禁止部１４１６を有することにより、表示制御部１４１２は、切換処理により切換えられる言語での画像を表示する制御を実行する必要がなくなる。したがって、表示制御部１４１２を有する制御部３１の処理負担を軽減できる。なお、禁止部１４１６は、ユーザーによる言語切換操作が行われたときには、表示切換処理により切換えられる言語での画像を表示する制御を禁止しないようにしてもよい（表示切換処理により切換えられる言語での画像を表示する制御を実行するようにしてもよい）。 (3) Further, the MFP 140 may be provided with a prohibition unit 1416 that prohibits control of displaying an image in a language that is switched by the switching process. The prohibited portion 1416 is as shown by the broken line portion in FIG. By having such a prohibition unit 1416, the display control unit 1412 does not need to execute the control of displaying the image in the language switched by the switching process. Therefore, the processing load of the control unit 31 having the display control unit 1412 can be reduced. Note that the prohibition unit 1416 may not prohibit the control of displaying the image in the language switched by the display switching process when the language switching operation is performed by the user (in the language switched by the display switching process). You may want to perform control to display the image).

また、ＭＦＰ１４０は、ディスプレイ１０５１を備えない構成としてもよい。このような構成によれば、ＭＦＰ１４０のサイズを小型化できるとともにＭＦＰ１４０の製造コストを削減できる。また、ディスプレイ１０５１を備えないＭＦＰ１４０は、禁止部１４１６を有することにより、該ＭＦＰ１４０を適切に制御させることができる。 Further, the MFP 140 may be configured not to include the display 1051. According to such a configuration, the size of the MFP 140 can be reduced and the manufacturing cost of the MFP 140 can be reduced. Further, the MFP 140 without the display 1051 can appropriately control the MFP 140 by having the prohibition unit 1416.

（４）また、前述の実施形態では、ＭＦＰ１４０は、エラーが発生した場合には、スピーカー３５は、エラーが発生したことを示す音声（例えば、「エラーが発生しました」という音声）、およびエラーの内容を示す音声（例えば、「トナー切れです」という音声）を切換処理の前の言語（デフォルト言語）で出力する、として説明した。しかしながら、ＭＦＰ１４０は、エラーが発生したことを示す音声を、切換処理により切換えられた言語で出力し、エラーの内容を示す音声を切換前の言語で出力する。トナー切れのエラーが発生した場合には、ＭＦＰ１４０のカラーコピーが停止するとともに、「An error has occurred トナー切れです」というエラー音声を出力する。 (4) Further, in the above-described embodiment, when an error occurs in the MFP 140, the speaker 35 has a voice indicating that an error has occurred (for example, a voice saying "an error has occurred") and an error. It was explained that the voice indicating the content of the above (for example, the voice "out of toner") is output in the language (default language) before the switching process. However, the MFP 140 outputs a voice indicating that an error has occurred in the language switched by the switching process, and outputs a voice indicating the content of the error in the language before switching. When a toner out error occurs, the color copy of the MFP140 is stopped and an error voice "An error has occurred out of toner" is output.

上記で説明した状況において、仮に、日本語で「エラーが発生しました」というエラー音声を出力すると、日本語を理解できないユーザーは、何故、ＭＦＰ１４０のカラーコピーが停止したのかを理解できない。そこで、本変形例では、エラーが発生したことを示すエラー発生音声を、切換処理後の言語（英語）での音声を出力する。一方で、エラーの内容を示すエラー内容音声は、日本語で出力する。したがって、英語を理解できない管理者であっても、エラーが発生したこと、およびエラーの内容を理解できる。 In the situation described above, if the error voice "An error has occurred" is output in Japanese, the user who cannot understand Japanese cannot understand why the color copy of the MFP 140 has stopped. Therefore, in this modification, the error-occurring voice indicating that an error has occurred is output in the language (English) after the switching process. On the other hand, the error content voice indicating the error content is output in Japanese. Therefore, even an administrator who does not understand English can understand that an error has occurred and the content of the error.

なお、さらなる変形例として、ＭＦＰ１４０は、エラー発生音声およびエラー内容音声を共に、切換後の言語（例えば、英語）で出力するようにしてもよい。つまり、ＭＦＰ１４０は、実行部１４０４によるジョブの実行状態に関する音声を、切換後の言語（例えば、英語）で出力するようにしてもよい。 As a further modification, the MFP 140 may output both the error occurrence voice and the error content voice in the language after switching (for example, English). That is, the MFP 140 may output the voice related to the execution state of the job by the execution unit 1404 in the language after switching (for example, English).

（５）また、図２に示される第１情報処理装置６０と第２情報処理装置１００とを統合するようにしてもよい。この場合には、スマートスピーカー２０と、該統合された情報処理装置と、ＭＦＰ１４０とがそれぞれネットワーク等で接続される。このような構成を採用した画像形成システムでは、構成する装置の数を減少させることができ、結果として、通信料を低減できる。 (5) Further, the first information processing device 60 and the second information processing device 100 shown in FIG. 2 may be integrated. In this case, the smart speaker 20, the integrated information processing device, and the MFP 140 are each connected by a network or the like. In the image forming system adopting such a configuration, the number of constituent devices can be reduced, and as a result, the communication charge can be reduced.

また、今回開示された各実施の形態は全ての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内での全ての変更が含まれることが意図される。また、実施の形態および各変形例において説明された発明は、可能な限り、単独でも、組合わせても、実施することが意図される。 In addition, it should be considered that each embodiment disclosed this time is exemplary in all respects and is not restrictive. The scope of the present invention is shown by the scope of claims rather than the above description, and it is intended to include all modifications within the meaning and scope equivalent to the scope of claims. In addition, the inventions described in the embodiments and the modifications are intended to be implemented, either alone or in combination, wherever possible.

１１処理ユニット、１４給紙ユニット、２０スマートスピーカー、３１制御部、３２固定記憶装置、３４操作パネル、３５スピーカー、３８バス、４０第１ネットワーク、６０第１情報処理装置、８０第２ネットワーク、１００第２情報処理装置、１２０第３ネットワーク、１５０第１提供装置、１６０第２提供装置、１４０２取得部、１４０４実行部、１４０６切換部、１４０８設定部、１４１０スピーカー制御部、１４１２表示制御部、１４１６禁止部。 11 Processing unit, 14 Feeding unit, 20 Smart speaker, 31 Control unit, 32 Fixed storage device, 34 Operation panel, 35 Speaker, 38 Bus, 40 1st network, 60 1st information processing device, 80 2nd network, 100 2nd information processing device, 120 3rd network, 150 1st providing device, 160 2nd providing device, 1402 acquisition unit, 1404 execution unit, 1406 switching unit, 1408 setting unit, 1410 speaker control unit, 1412 display control unit, 1416 Prohibition part.

Claims

An image forming system including an audio input / output device, an information processing device, and an image forming device.
The audio input / output device is
When the first voice is input and the first voice is input, the second voice is output.
The first voice is transmitted,
When the information processing device receives the first voice from the voice input / output device, the information processing device transmits linguistic information that can identify the language of the second voice.
The image forming apparatus
A language acquisition unit that acquires the language of the second voice specified from the language information transmitted from the information processing device, and
Setting part to set the language and
A switching unit that switches the language set by the setting unit to the language of the second voice acquired by the language acquisition unit, and
An image forming system including an output unit that outputs a third sound in a language switched by the switching unit.

The information processing device
The language of the first voice transmitted from the voice input / output device is detected, and the language is detected.
The image forming system according to claim 1, wherein the detected language of the first voice is transmitted as the language information and acquired as the language of the second voice.

The image forming system according to claim 1, wherein the information processing device transmits the language of the second voice set in the voice input / output device as the language information.

The image forming system further includes a first providing device for providing audio data.
When the image forming apparatus does not store the audio data corresponding to the language switched by the switching unit, the image forming apparatus acquires the audio data corresponding to the language switched by the switching unit from the first providing device. Equipped with a voice data acquisition unit
The image forming system according to any one of claims 1 to 3, wherein the output unit outputs the third voice based on the voice data acquired by the voice data acquisition unit.

When the output unit outputs the third voice before the acquisition of the voice data from the first providing device, the output unit is not the third voice in the language switched by the switching unit, but the switching unit. The image forming system according to claim 4, wherein the third sound is output in the language before being switched by.

The image forming apparatus
A display unit that displays images and
A display control unit that controls the display unit is further provided.
The image forming system according to any one of claims 1 to 5, wherein the display control unit controls the display unit so as to display an image in a language switched by the switching unit.

The image forming system further includes a second providing device that provides image data.
When the image forming apparatus does not store the image data corresponding to the language switched by the switching unit, the image forming apparatus acquires the image data corresponding to the language switched by the switching unit from the second providing device. Equipped with an image data acquisition unit
The image forming system according to claim 6, wherein the display control unit displays an image based on the image data acquired by the image data acquisition unit on the display unit.

The display control unit controls the display unit so as to display an image in the language before being switched by the switching unit when a job based on the first voice transmitted from the information processing device is completed. The image forming system according to claim 6 or 7.

When the image forming apparatus determines that the connection between the audio input / output device and the image forming apparatus has been disconnected, the display control unit displays an image in the language before being switched by the switching unit. The image forming system according to claim 6 or 7, wherein the display unit is controlled.

The image forming system according to any one of claims 1 to 9, further comprising a prohibiting unit that prohibits control of displaying an image in a language switched by the switching unit.

Claims 1 to claim that the output unit can output a voice in a language before being switched by the switching unit when a job based on the first voice transmitted from the information processing device is completed. Item 10. The image forming system according to any one of Items 10.

Claims 1 to claim that when an error occurs, the output unit outputs a voice indicating that the error has occurred and a voice indicating the content of the error in the language before being switched by the switching unit. 11. The image forming system according to any one of the paragraphs.

When an error occurs, the output unit outputs a voice indicating that the error has occurred in the language switched by the switching unit, and the voice indicating the content of the error before being switched by the switching unit. The image forming system according to any one of claims 1 to 11, which is output in a language.

The output unit according to any one of claims 1 to 13, wherein the output unit outputs a voice indicating that the job based on the voice transmitted from the voice input / output device is completed as the third voice. Image formation system.

A language acquisition unit that acquires language information that can identify the language of the second voice output from the voice input / output device when the first voice from the user is input to the voice input / output device.
Setting part to set the language and
A switching unit that switches the language set by the setting unit to the language of the second voice acquired by the language acquisition unit, and
An image forming apparatus including an output unit that outputs a third sound in a language switched by the switching unit.

When the voice data corresponding to the language switched by the switching unit is not stored, the voice data acquisition unit for acquiring the voice data corresponding to the language switched by the switching unit from the first providing device is further provided.
The image forming apparatus according to claim 15, wherein the output unit outputs the third voice based on the voice data acquired by the voice data acquisition unit.

When the output unit outputs the third voice before the acquisition of the voice data from the first providing device, the output unit is not the third voice in the language switched by the switching unit, but the switching unit. 16. The image forming apparatus according to claim 16, which outputs the third sound in the language before being switched by.

A display unit that displays images and
A display control unit that controls the display unit is further provided.
The image forming apparatus according to any one of claims 15 to 17, wherein the display control unit controls the display unit so as to display an image in a language switched by the switching unit.

When the image forming apparatus does not store the image data corresponding to the language switched by the switching unit, the image data corresponding to the language switched by the switching unit is acquired from the second providing device. With more acquisition departments
The image forming apparatus according to claim 18, wherein the display control unit causes the display unit to display an image based on the image data acquired by the image data acquisition unit.

18. The display control unit controls the display unit so as to display an image in the language before being switched by the switching unit when the job based on the first voice is completed. 19. The image forming apparatus according to 19.

When the image forming apparatus determines that the connection between the audio input / output device and the image forming apparatus has been disconnected, the display control unit displays an image in the language before being switched by the switching unit. The image forming apparatus according to claim 18 or 19, which controls the display unit.

The image forming apparatus according to any one of claims 15 to 21, further comprising a prohibition unit that prohibits control of displaying an image in a language switched by the switching unit.

The output unit according to any one of claims 15 to 22, wherein when the job based on the first voice is completed, the output unit can output the voice in the language before being switched by the switching unit. Image forming device.

Claims 15 to 15, wherein when an error occurs, the output unit outputs a voice indicating that the error has occurred and a voice indicating the content of the error in the language before being switched by the switching unit. 23 The image forming apparatus according to any one of the following items.

When an error occurs, the output unit outputs a voice indicating that the error has occurred in the language switched by the switching unit, and the voice indicating the content of the error before being switched by the switching unit. The image forming apparatus according to any one of claims 15 to 23, which outputs in a language.

The output unit according to any one of claims 15 to 25, wherein the output unit outputs a voice indicating that a job based on the voice transmitted from the voice input / output device is completed as the third voice. Image forming device.

On the computer
A step of acquiring language information that can identify the language of the second voice output from the voice input / output device when the first voice from the user is input to the voice input / output device, and
The step of switching the set language to the second spoken language, and
A control program for an image forming apparatus that executes a step of outputting a third sound in the switched language.

A step of acquiring language information that can identify the language of the second voice output from the voice input / output device when the first voice from the user is input to the voice input / output device, and
The step of switching the set language to the second spoken language, and
A method of controlling an image forming apparatus, comprising a step of outputting a third sound in a switched language.