JP2020113197A

JP2020113197A - Information processing apparatus, information processing method, and information processing program

Info

Publication number: JP2020113197A
Application number: JP2019005363A
Authority: JP
Inventors: 慎江上; Shin Egami; 一希笠井; Kazuki KASAI; 和田　純一; Junichi Wada; 純一和田
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 2019-01-16
Filing date: 2019-01-16
Publication date: 2020-07-27
Anticipated expiration: 2039-01-16
Also published as: WO2020148920A1; JP7036046B2

Abstract

【課題】参加者間の関係性を評価する。【解決手段】情報処理装置１０は、複数の参加者の表情情報を取得する表情情報取得部１３と、参加者の発話情報を取得する音声情報取得部１４と、表情情報から表情関係性情報を生成する表情関係性情報生成部１５と、発話情報から発話関係性情報を生成する発話関係性情報生成部１６と、表情関係性情報と発話関係性情報とから関係性情報を生成する関係性情報生成部１７とを備えている。【選択図】図１PROBLEM TO BE SOLVED: To evaluate a relationship between participants. SOLUTION: An information processing device 10 obtains facial expression relationship information from facial expression information by a facial expression information acquisition unit 13 that acquires facial expression information of a plurality of participants, a voice information acquisition unit 14 that acquires utterance information of participants. The facial expression relationship information generation unit 15 to be generated, the utterance relationship information generation unit 16 to generate the utterance relationship information from the utterance information, and the relationship information to generate the relationship information from the facial expression relationship information and the utterance relationship information. It includes a generation unit 17. [Selection diagram] Fig. 1

Description

本発明は、情報処理装置、情報処理方法、及び情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

ユーザ間の円滑なコミュニケーションを支援するサービスが従来技術として知られている。特許文献１には、コールセンターのオペレータと顧客との会話における音声に基づいて感情を認識し、音声と感情とを組み合わせて音声分析することで、オペレータのパフォーマンスを評価する電話音声モニタリング評価システムが記載されている。特許文献２には、ユーザが入力したチャット文が示す感情を認識し、ユーザ間の感情の類似度を求める感情マッチング装置が記載されている。 A service that supports smooth communication between users is known as a conventional technique. Patent Literature 1 describes a telephone voice monitoring evaluation system that evaluates the performance of an operator by recognizing emotions based on voices in conversation between a call center operator and a customer and performing voice analysis by combining voices and emotions. Has been done. Patent Literature 2 describes an emotion matching device that recognizes emotions indicated by a chat sentence input by a user and obtains the similarity of emotions between users.

特開２０１７−１３５６４２号公報（２０１７年８月３日公開）JP-A-2017-135642 (Published August 3, 2017) 特開２００５−２８４８２２号公報（２００５年１０月１３日公開）JP-A-2005-284822 (Published October 13, 2005)

しかしながら、上述のような従来技術は、会話における音声のみに基づいて、又は、入力されたチャット文のみに基づいて、話者の感情を認識するため、多面的な感情認識ができないという問題がある。 However, the above-described conventional techniques have a problem that multi-faceted emotion recognition cannot be performed because the emotion of the speaker is recognized based on only the voice in the conversation or only the input chat sentence. ..

本発明の一態様は、会話中の各ユーザの感情を多面的に認識し、認識された感情に基づく会話の評価を通知するコミュニケーション支援技術を提供することを目的とする。 An aspect of the present invention is to provide a communication support technique for recognizing emotions of each user during a conversation from multiple sides and notifying a conversation evaluation based on the recognized emotions.

前記の課題を解決するために、本発明の一態様に係る情報処理装置は、複数の参加者のうち第１の参加者の表情に関する第１の表情情報と、前記複数の参加者のうち第２の参加者の表情に関する第２の表情情報とを取得する表情情報取得部と、当該第１の参加者の発話に関する第１の発話情報と、前記複数の参加者のうち第２の参加者の発話に関する第２の発話情報とを取得する音声情報取得部と、前記第１の表情情報と前記第２の表情情報とを参照して、前記第１の参加者と前記第２の参加者との表情に関する関係性を示す表情関係性情報を生成する表情関係性情報生成部と、前記第１の発話情報と前記第２の発話情報とを参照して、前記第１の参加者と前記第２の参加者との発話に関する関係性を示す発話関係性情報を生成する発話関係性情報生成部と、前記表情関係性情報と前記発話関係性情報とを参照して前記第１の参加者と前記第２の参加者との関係を示す情報である関係性情報を生成する関係性情報生成部と、を備えていることを特徴としている。 In order to solve the above problems, an information processing apparatus according to an aspect of the present invention provides first facial expression information regarding a facial expression of a first participant among a plurality of participants, and first facial expression information among the plurality of participants. Facial expression information acquisition unit for acquiring second facial expression information regarding facial expressions of the second participant, first utterance information regarding utterances of the first participant, and second participant of the plurality of participants. The first participant and the second participant with reference to the voice information acquisition unit that acquires the second utterance information related to the utterance of the user and the first facial expression information and the second facial expression information. The facial expression relationship information generating unit that generates facial expression relationship information indicating a relationship related to facial expressions, and the first participant and the first utterance information with reference to the first utterance information and the second utterance information. The first participant with reference to the utterance relationship information generating unit that generates utterance relationship information indicating a relationship regarding utterance with the second participant, and the facial expression relationship information and the utterance relationship information. And a relationship information generating unit that generates relationship information that is information indicating the relationship between the second participant and the second participant.

これによれば、各参加者の音声情報及び表情情報の両方に基づいて、会議中の参加者間の関係性を評価することができる。 According to this, the relationship between the participants in the conference can be evaluated based on both the voice information and the facial expression information of each participant.

前記一態様に係る情報処理装置において、前記関係性情報は、前記第１の参加者と前記第２の参加者との関係を示すリアルタイム又は経時的な情報である。 In the information processing device according to the aspect, the relationship information is real-time or time-dependent information indicating a relationship between the first participant and the second participant.

これによれば、各参加者の音声情報及び表情情報の両方に基づいて、会議中の参加者間の関係性をリアルタイムで評価することができる。 According to this, the relationship between the participants in the conference can be evaluated in real time based on both the voice information and the facial expression information of each participant.

前記一態様に係る情報処理装置において、前記第１の表情情報には、前記第１の参加者の表情を表現する複数の第１の指標が含まれており、前記第２の表情情報には、前記第２の参加者の表情を表現する複数の第２の指標が含まれており、前記表情関係性情報生成部は、前記第１の指標と前記第２の指標との差に関する表情差分情報を生成し、生成した表情差分情報を、前記表情関係性情報に含める。 In the information processing apparatus according to the one aspect, the first facial expression information includes a plurality of first indexes expressing the facial expression of the first participant, and the second facial expression information includes , A plurality of second indexes expressing the facial expressions of the second participant, and the facial expression relationship information generating unit is configured to detect a facial expression difference regarding a difference between the first index and the second index. Information is generated, and the generated facial expression difference information is included in the facial expression relationship information.

これによれば、表情関係性情報を生成するために参照する参加者の表情を表現するために複数の指標を用いるため、より正確に参加者の表情を表現することができる。 According to this, since the plurality of indexes are used to express the facial expression of the participant referred to in order to generate the facial expression relationship information, the facial expression of the participant can be expressed more accurately.

前記一態様に係る情報処理装置において、前記第１の表情情報には、前記第１の参加者の視線方向に関する第１の視線情報が含まれており、前記第２の表情情報には、前記第２の参加者の視線方向に関する第２の視線情報が含まれており、前記表情関係性情報生成部は、前記第１の視線情報と前記第２の視線情報とを参照して視線関係性情報を生成し、生成した視線関係性情報を、前記表情関係性情報に含める。 In the information processing device according to the aspect, the first facial expression information includes first visual line information regarding a visual line direction of the first participant, and the second facial expression information includes the first visual line information. The second line-of-sight information regarding the line-of-sight of the second participant is included, and the facial expression relationship information generation unit refers to the first line-of-sight information and the second line-of-sight information to determine the line-of-sight relationship. Information is generated, and the generated line-of-sight relationship information is included in the facial expression relationship information.

これによれば、表情関係性情報を生成するために参照する参加者の表情情報に参加者の視線情報も含むため、より正確に参加者の表情を表現することができる。 According to this, since the participant's facial expression information referred to in order to generate the facial expression relationship information also includes the participant's line-of-sight information, the participant's facial expression can be expressed more accurately.

前記一態様に係る情報処理装置において、前記発話関係性情報生成部は、前記第１の発話情報が示す前記第１の参加者の発話時間と、前記第２の発話情報が示す前記第２の参加者の発話時間との関係を示す発話時間関係性情報を生成し、生成した発話時間関係性情報を、前記発話関係性情報に含める。 In the information processing device according to the aspect, the utterance relationship information generating unit may include the utterance time of the first participant indicated by the first utterance information and the second utterance time indicated by the second utterance information. The utterance time relationship information indicating the relationship with the utterance time of the participant is generated, and the generated utterance time relationship information is included in the utterance relationship information.

これによれば、発話関係性情報を生成するために参照する参加者の発話情報に発話時間関係性情報も含むため、より正確に参加者の発話関係性情報を生成することができる。 According to this, since the utterance time relationship information is included in the utterance information of the participant referred to for generating the utterance relationship information, the utterance relationship information of the participant can be generated more accurately.

前記一態様に係る情報処理装置において、前記発話関係性情報生成部は、前記第１の発話情報及び前記第２の発話情報の少なくとも何れかに、特定のカテゴリーに含まれる発話内容が含まれているか否かを判定し、判定した結果に応じた情報を前記発話関係性情報に含める。 In the information processing device according to the aspect, the utterance relationship information generating unit may include utterance content included in a specific category in at least one of the first utterance information and the second utterance information. It is determined whether or not there is information, and information according to the determination result is included in the utterance relationship information.

これによれば、発話関係性情報に特定のカテゴリーに含まれる発話内容が含まれているか否かの判定結果に応じた情報も含むため、より正確に参加者間の関係性情報を生成することができる。 According to this, since the utterance relationship information also includes information according to the determination result of whether or not the utterance content included in a specific category is included, it is possible to more accurately generate the relationship information between the participants. You can

前記一態様に係る情報処理装置において、前記発話関係性情報生成部は、前記第１の発話情報及び前記第２の発話情報の少なくとも何れかから、所定時間内において相対的に出現頻度の高い単語を抽出し、抽出した単語を前記発話関係性情報に含める。 In the information processing device according to the one aspect, the utterance relationship information generating unit is a word having a relatively high appearance frequency within a predetermined time from at least one of the first utterance information and the second utterance information. And the extracted word is included in the utterance relationship information.

これによれば、発話関係性情報に頻度の高い単語の上方も含むため、より正確に参加者間の関係性情報を生成することができる。 According to this, since the utterance relationship information includes the upper part of a word having a high frequency, it is possible to more accurately generate the relationship information between the participants.

前記一態様に係る情報処理装置において、前記関係性情報生成部は、前記関係性情報を参照して、前記第１の参加者及び前記第２の参加者の少なくとも何れかに提示する提示情報を生成する。 In the information processing device according to the one aspect, the relationship information generation unit refers to the relationship information and presents presentation information to be presented to at least one of the first participant and the second participant. To generate.

これによれば、提示情報を参加者に提示することで、関係性情報を参加者に認識させることができる。 According to this, the participant can be made to recognize the relationship information by presenting the presentation information to the participant.

前記提示情報には、前記第１の参加者の発話時間と、前記第２の参加者の発話時間との割合を示す情報、及び、前記第１の参加者の視線方向と、前記第２の参加者の視線方向との合致率の経時変化に関する情報が含まれている。 The presentation information includes information indicating a ratio between the utterance time of the first participant and the utterance time of the second participant, the line-of-sight direction of the first participant, and the second It contains information about changes over time in the match rate with the gaze direction of the participants.

これによれば、提示情報を参加者に提示することで、各参加者の発話時間の割合、及び、各参加者の視線方向の合致率の経時変化を参加者に認識させることができる。 According to this, by presenting the presentation information to the participants, the participants can be made aware of the ratio of the utterance time of each participant and the temporal change in the matching rate of the gaze direction of each participant.

前記一態様に係る情報処理装置において、前記表情関係性情報生成部、及び前記発話関係性情報生成部は、前記第１及び第２の参加者の属性を示す参加者情報を更に参照して、前記表情関係性情報及び前記発話関係性情報を生成する。 In the information processing device according to the aspect, the facial expression relationship information generating unit and the utterance relationship information generating unit further refer to participant information indicating attributes of the first and second participants, The facial expression relationship information and the utterance relationship information are generated.

これによれば、表情関係性情報及び発話関係性情報の生成に参加者の属性も参照するので、より正確な表情関係性情報及び発話関係性情報を生成することができる。 According to this, since the participant's attribute is also referred to in the generation of the facial expression relationship information and the utterance relationship information, more accurate facial expression relationship information and utterance relationship information can be generated.

また、前記課題を解決するために、本発明の一態様に係る情報処理方法は、複数の参加者のうち第１の参加者の表情に関する第１の表情情報と、前記複数の参加者のうち第２の参加者の表情に関する第２の表情情報とを取得する表情情報取得ステップと、当該第１の参加者の発話に関する第１の発話情報と、前記複数の参加者のうち第２の参加者の発話に関する第２の発話情報とを取得する音声情報取得ステップと、前記第１の表情情報と前記第２の表情情報とを参照して、前記第１の参加者と前記第２の参加者との表情に関する関係性を示す表情関係性情報を生成する表情関係性情報生成ステップと、前記第１の発話情報と前記第２の発話情報とを参照して、前記第１の参加者と前記第２の参加者との発話に関する関係性を示す発話関係性情報を生成する発話関係性情報生成ステップと、前記表情関係性情報と前記発話関係性情報とを参照して前記第１の参加者と前記第２の参加者との関係を示すリアルタイム又は経時的な情報である関係性情報を生成する関係性情報生成ステップと、を含む、ことを特徴としている。 Further, in order to solve the above problems, an information processing method according to an aspect of the present invention provides first facial expression information regarding a facial expression of a first participant among a plurality of participants, and among the plurality of participants. A facial expression information acquisition step of acquiring second facial expression information regarding the facial expression of the second participant, first speech information regarding the speech of the first participant, and second participation among the plurality of participants. The first participant and the second participant with reference to the voice information acquisition step of acquiring second utterance information related to a person's utterance, and the first facial expression information and the second facial expression information. A facial expression relationship information generating step of generating facial expression relationship information indicating a relationship related to a facial expression with a person, and the first participant with reference to the first utterance information and the second utterance information. The first participation with reference to the utterance relationship information generating step of generating utterance relationship information indicating a relationship regarding utterance with the second participant, and the facial expression relationship information and the utterance relationship information. Relationship information generating step of generating relationship information which is real-time or time-dependent information indicating a relationship between a person and the second participant.

これによれば、各参加者の音声情報及び表情情報に基づいて、会議中の参加者間の関係性を評価することができる。 According to this, the relationship between the participants in the conference can be evaluated based on the voice information and the facial expression information of each participant.

また、前記課題を解決するために、本発明の一態様に係る情報処理プログラムは、前記何れかに記載の情報処理装置としてコンピュータを機能させるための情報処理プログラムであって、前記表情情報取得部、前記音声情報取得部、前記表情関係性情報生成部、前記発話関係性情報生成部、及び前記関係性情報生成部としてコンピュータを機能させる。 Further, in order to solve the above problems, an information processing program according to an aspect of the present invention is an information processing program for causing a computer to function as the information processing device according to any one of the above, and the facial expression information acquisition unit. A computer is caused to function as the voice information acquisition unit, the facial expression relationship information generation unit, the utterance relationship information generation unit, and the relationship information generation unit.

本発明の一態様によれば、各参加者の音声情報及び表情情報に基づいて、会議中の参加者間の関係性を評価することができる。 According to one aspect of the present invention, it is possible to evaluate the relationship between participants in a conference based on the voice information and facial expression information of each participant.

本発明の一実施形態に係る情報処理装置を含む情報処理システムの構成要素の一例を示すブロック図である。It is a block diagram which shows an example of the component of the information processing system containing the information processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理装置を含む情報処理システムの概要を示す図である。It is a figure showing an outline of an information processing system containing an information processor concerning one embodiment of the present invention. 本発明の一実施形態に係る情報処理装置を含む情報処理システムにおけるデータの流れの概要を示す図である。It is a figure which shows the outline of the flow of the data in the information processing system containing the information processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理装置を含む情報処理システムが提示する情報の一例を示す図である。It is a figure which shows an example of the information which the information processing system containing the information processing apparatus which concerns on one Embodiment of this invention presents. 本発明の一実施形態に係る情報処理装置を含む情報処理システムが提示する情報の他の例を示す図である。It is a figure which shows the other example of the information which the information processing system containing the information processing apparatus which concerns on one Embodiment of this invention presents.

［実施形態１］
以下、本発明の一実施形態について、詳細に説明する。図１は、本実施形態の情報処理装置１０を含む情報処理システム１００の概要を示す図である。図１に示すように、情報処理システム１００は、情報処理装置１０、第１の端末装置２０、及び第２の端末装置３０を備えている。なお端末装置の数は、本実施形態を限定するものではなく、３以上であってもよい。 [Embodiment 1]
Hereinafter, an embodiment of the present invention will be described in detail. FIG. 1 is a diagram showing an outline of an information processing system 100 including the information processing apparatus 10 of this embodiment. As shown in FIG. 1, the information processing system 100 includes an information processing device 10, a first terminal device 20, and a second terminal device 30. The number of terminal devices is not limited to this embodiment and may be three or more.

図２は、本発明の一実施形態に係る情報処理装置１０を含む情報処理システム１００の概要を示す図である。情報処理システム１００においては、図２に示すように、第１の端末装置２０を使用する第１の参加者２００と、第２の端末装置３０を使用する第２の参加者２０１との間の関係性を評価する。 FIG. 2 is a diagram showing an outline of an information processing system 100 including the information processing device 10 according to the embodiment of the present invention. In the information processing system 100, as shown in FIG. 2, between the first participant 200 who uses the first terminal device 20 and the second participant 201 who uses the second terminal device 30. Evaluate the relationship.

情報処理システム１００においては、第１の端末装置２０及び第２の端末装置３０から得られる第１の参加者２００及び第２の参加者２０１の会議中の表情情報及び発話情報を元に、情報処理装置１０が第１の参加者２００と第２の参加者２０１との会議中の関係性を評価する。情報処理システム１００は、関係性を評価した結果を、第１の端末装置２０及び第２の端末装置３０の少なくとも一方に表示させることで、第１の参加者２００及び第２の参加者２０１の少なくとも一方に評価結果をリアルタイムでフィードバックし、会議中のコミュニケーション状態の改善を促す。 In the information processing system 100, information is obtained based on facial expression information and speech information of the first participant 200 and the second participant 201 during the meeting, which are obtained from the first terminal device 20 and the second terminal device 30. The processing device 10 evaluates the relationship between the first participant 200 and the second participant 201 during the conference. The information processing system 100 displays the result of evaluation of the relationship on at least one of the first terminal device 20 and the second terminal device 30 so that the first participant 200 and the second participant 201 can receive the result. The evaluation results are fed back to at least one side in real time to encourage improvement of the communication status during the meeting.

なお、本実実施形態において、「会議」とは、狭義の意味の会議に限定されるものではなく、面談、面接、カウンセリング、問診、接客、接見、相談等が含まれる。一例として、
・上司と部下との面談
・医師による患者への問診
・カウンセラーによる対象者へのカウンセリング
・客に対する店員の接客や窓口相談
・ＷｅｂＭｅｅｔｉｎｇ等の遠隔でのコミュニケーション
・ｅ−Ｌｅａｒｎｉｎｇ等の対ビデオ画像に対するコミュニケーション
等が含まれる。 In the present embodiment, the “meeting” is not limited to a meeting in a narrow sense, but includes an interview, an interview, a counseling, an inquiry, a customer service, an interview, a consultation, and the like. As an example,
・Interview with supervisor and subordinates ・Interview with patient by doctor ・Counseling of target person by counselor ・Customer service and customer consultation at customer ・Remote communication such as Web Meeting ・For video image such as e-learning Communication etc. are included.

また、図３は、情報処理システム１００におけるデータの流れの概要を示す図である。 Further, FIG. 3 is a diagram showing an outline of a data flow in the information processing system 100.

〔第１の端末装置２０〕
図１に示すように、第１の端末装置２０は、カメラ２１、マイク２２、表示部２３、制御部２４、スピーカ２５、及び通信部２６を備えている。 [First Terminal Device 20]
As shown in FIG. 1, the first terminal device 20 includes a camera 21, a microphone 22, a display unit 23, a control unit 24, a speaker 25, and a communication unit 26.

＜動画取得処理＞
カメラ２１は、第１の参加者を撮像し、撮像画像を制御部２４に供給する。ここで、カメラ２１による撮像画像は、動画像であることが好ましく、当該構成の場合、図３に示すように、カメラ２１は、制御部２４に対して、動画ファイル、及び動画ファイルに含まれる各画像のリストである動画ファイルリストの少なくとも何れかを供給する。カメラ２１は、当該動画ファイルに含まれる各画像の撮像時刻を示すタイムスタンプを制御部２４に供給する。 <Video acquisition process>
The camera 21 images the first participant and supplies the captured image to the control unit 24. Here, the image captured by the camera 21 is preferably a moving image, and in the case of this configuration, as shown in FIG. 3, the camera 21 is included in the moving image file and the moving image file for the control unit 24. At least one of the moving image file lists, which is a list of each image, is supplied. The camera 21 supplies the control unit 24 with a time stamp indicating the imaging time of each image included in the moving image file.

なお、第１の端末装置２０は、複数のカメラを備える構成としてもよく、当該構成の場合、制御部２４は、カメラデバイス識別情報を参照することによって、カメラ２１を特定することができる。 The first terminal device 20 may have a configuration including a plurality of cameras, and in this configuration, the control unit 24 can identify the camera 21 by referring to the camera device identification information.

＜画像認識処理＞
制御部２４は、カメラ２１から供給される動画ファイル、動画ファイルリスト、及びタイムスタンプを参照して、画像認識処理を行う。 <Image recognition processing>
The control unit 24 refers to the moving image file, the moving image file list, and the time stamp supplied from the camera 21, and performs the image recognition process.

一例として、図３に示すように、制御部２４は、動画ファイル、動画ファイルリスト、及びタイムスタンプを参照した画像認識処理を行うことによって、時系列表情値、時系列顔パーツ座標、時系列視線座標を算出する。時系列表情値、時系列顔パーツ座標、時系列視線座標は、第１の参加者の表情に関する第１の表情情報の一例である。 As an example, as shown in FIG. 3, the control unit 24 performs the image recognition process with reference to the moving image file, the moving image file list, and the time stamp, thereby performing the time-series facial expression value, the time-series face part coordinate, and the time-series line-of-sight. Calculate the coordinates. The time-series facial expression value, the time-series facial part coordinate, and the time-series gaze coordinate are examples of first facial expression information regarding the facial expression of the first participant.

＜音声取得処理＞
マイク２２は、主として、第１の参加者の発話する音声を集音し、集音した音声を示す音声ファイル、及び、当該音声ファイルにおける発話の時点を特定するためのタイムスタンプを制御部２４に供給する。 <Voice acquisition processing>
The microphone 22 mainly collects the voices uttered by the first participant, provides the control unit 24 with a voice file indicating the collected voices, and a time stamp for specifying the point of utterance in the voice file. Supply.

なお、第１の端末装置２０は、複数のマイクを備える構成としてもよく、当該構成の場合、制御部２４は、音声デバイス識別情報を参照することによって、マイク２２を特定することができる。 The first terminal device 20 may be configured to include a plurality of microphones. In this configuration, the control unit 24 can identify the microphone 22 by referring to the audio device identification information.

＜発話認識処理＞
制御部２４は、マイク２２から供給される音声ファイル及びタイムスタンプを参照して、発話認識処理を行う。 <Utterance recognition processing>
The control unit 24 refers to the audio file and the time stamp supplied from the microphone 22 and performs the speech recognition process.

一例として、図３に示すように、制御部２４は、音声ファイル、及びタイムスタンプを参照した発話認識処理を行うことによって、区間時系列テキストデータを生成する。ここで区間時系列テキストデータは、主として第１の参加者が発話した内容を時系列的にテキストデータとして示す情報である。当該区間時系列テキストデータは、第１の参加者の発話を示す第１の発話情報の一例である。 As an example, as illustrated in FIG. 3, the control unit 24 generates the section time-series text data by performing the speech recognition process with reference to the audio file and the time stamp. Here, the section time-series text data is information that mainly shows the contents uttered by the first participant in time series as text data. The section time-series text data is an example of first utterance information indicating the utterance of the first participant.

〔第２の端末装置３０〕
また、第２の端末装置３０は、カメラ３１、マイク３２、表示部３３、制御部３４、スピーカ３５、及び通信部３６を備えている。 [Second terminal device 30]
The second terminal device 30 also includes a camera 31, a microphone 32, a display unit 33, a control unit 34, a speaker 35, and a communication unit 36.

＜動画取得処理＞
カメラ３１は、第２の参加者を撮像し、撮像画像を制御部３４に供給する。ここで、カメラ３１による撮像画像は、動画像であることが好ましく、当該構成の場合、図３に示すように、カメラ３１は、制御部３４に対して、動画ファイル、及び動画ファイルに含まれる各画像のリストである動画ファイルリストの少なくとも何れかを供給する。カメラ３１は、当該動画ファイルに含まれる各画像の撮像時刻を示すタイムスタンプを制御部３４に供給する。 <Video acquisition process>
The camera 31 images the second participant and supplies the captured image to the control unit 34. Here, the image captured by the camera 31 is preferably a moving image, and in the case of this configuration, the camera 31 is included in the moving image file and the moving image file for the control unit 34, as shown in FIG. At least one of the moving image file lists, which is a list of each image, is supplied. The camera 31 supplies the control unit 34 with a time stamp indicating the image capturing time of each image included in the moving image file.

なお、第２の端末装置３０は、複数のカメラを備える構成としてもよく、当該構成の場合、制御部３４は、カメラデバイス識別情報を参照することによって、カメラ３１を特定することができる。 Note that the second terminal device 30 may have a configuration including a plurality of cameras, and in this configuration, the control unit 34 can identify the camera 31 by referring to the camera device identification information.

＜画像認識処理＞
制御部３４は、カメラ３１から供給される動画ファイル、動画ファイルリスト、及びタイムスタンプを参照して、画像認識処理を行う。 <Image recognition processing>
The control unit 34 refers to the moving image file, the moving image file list, and the time stamp supplied from the camera 31, and performs image recognition processing.

一例として、図３に示すように、制御部３４は、動画ファイル、動画ファイルリスト、及びタイムスタンプを参照した画像認識処理を行うことによって、時系列表情値、時系列顔パーツ座標、時系列視線座標を算出する。時系列表情値、時系列顔パーツ座標、時系列視線座標は、第２の参加者の表情に関する第２の表情情報の一例である。 As an example, as shown in FIG. 3, the control unit 34 performs the image recognition process with reference to the moving image file, the moving image file list, and the time stamp, thereby performing the time-series facial expression value, the time-series face part coordinate, and the time-series line-of-sight. Calculate the coordinates. The time-series facial expression value, the time-series facial part coordinate, and the time-series gaze coordinate are examples of second facial expression information regarding the facial expression of the second participant.

＜音声取得処理＞
マイク３２は、主として、第２の参加者の発話する音声を集音し、集音した音声を示す音声ファイル、及び、当該音声ファイルにおける発話の時点を特定するためのタイムスタンプを制御部３４に供給する。 <Voice acquisition processing>
The microphone 32 mainly collects the voices uttered by the second participant, provides the control unit 34 with a voice file indicating the collected voices, and a time stamp for specifying the point of utterance in the voice file. Supply.

なお、第２の端末装置３０は、複数のマイクを備える構成としてもよく、当該構成の場合、制御部３４は、音声デバイス識別情報を参照することによって、マイク３２を特定することができる。 The second terminal device 30 may be configured to include a plurality of microphones, and in this configuration, the control unit 34 can identify the microphone 32 by referring to the audio device identification information.

＜発話認識処理＞
制御部３４は、マイク３２から供給される音声ファイル及びタイムスタンプを参照して、発話認識処理を行う。 <Utterance recognition processing>
The control unit 34 refers to the audio file and the time stamp supplied from the microphone 32 and performs the speech recognition process.

一例として、図３に示すように、制御部３４は、音声ファイル、及びタイムスタンプを参照した発話認識処理を行うことによって、区間時系列テキストデータを生成する。ここで区間時系列テキストデータは、主として第２の参加者が発話した内容を時系列的にテキストデータとして示す情報である。当該区間時系列テキストデータは、第２の参加者の発話を示す第２の発話情報の一例である。 As an example, as illustrated in FIG. 3, the control unit 34 generates the section time-series text data by performing the speech recognition process with reference to the audio file and the time stamp. Here, the section time-series text data is information that mainly shows the content uttered by the second participant in time series as text data. The section time-series text data is an example of second utterance information indicating the utterance of the second participant.

〔情報処理装置１０〕
情報処理装置１０は、表情情報取得部１３、音声情報取得部１４、表情関係性情報生成部１５、発話関係性情報生成部１６、及び関係性情報生成部１７を備えている。情報処理装置１０は、さらに、通信部１１を備えている。表情情報取得部１３、音声情報取得部１４、表情関係性情報生成部１５、発話関係性情報生成部１６、関係性情報生成部１７は、演算部１２が備えている。 [Information processing device 10]
The information processing device 10 includes a facial expression information acquisition unit 13, a voice information acquisition unit 14, a facial expression relationship information generation unit 15, an utterance relationship information generation unit 16, and a relationship information generation unit 17. The information processing device 10 further includes a communication unit 11. The facial expression information acquisition unit 13, the voice information acquisition unit 14, the facial expression relationship information generation unit 15, the utterance relationship information generation unit 16, and the relationship information generation unit 17 are included in the calculation unit 12.

（表情情報取得部１３）
表情情報取得部１３は、通信部１１を介して複数の会議参加者のうち第１の参加者の表情に関する第１の表情情報と、複数の会議参加者のうち第２の参加者の表情に関する第２の表情情報とを取得する。 (Facial expression information acquisition unit 13)
The facial expression information acquisition unit 13 relates to first facial expression information regarding the facial expression of the first participant among the plurality of conference participants and the facial expression of the second participant among the plurality of conference participants via the communication unit 11. And the second facial expression information.

＜数値データ洗浄処理＞
表情情報取得部１３は、一例として、通信部１１を介して、第１の参加者に関する表情情報に含まれる時系列数値データである時系列表情値、時系列顔パーツ座標、及び時系列視線座標を参照し、当該時系列数値データに対して、一例として以下の処理を行うことによって、数値データ洗浄処理を行う。
・無効データ区間を削除する
・有効データ区間におけるデータを平均する
・分散及び項数に変換する
表情情報取得部１３は、上述の数値データ洗浄処理を行うことによって、第１の参加者に関する区間時系列数値データを生成する。当該区間時系列数値データは、有効区間における時系列表情値、時系列顔パーツ座標、及び時系列視線座標を含んでいる。 <Numerical data cleaning process>
The facial expression information acquisition unit 13 may, for example, via the communication unit 11, time-series facial expression values, which are time-series numerical data included in the facial expression information regarding the first participant, time-series face part coordinates, and time-series gaze coordinates. The numerical data cleaning process is performed on the time-series numerical data by performing the following process as an example.
・Delete the invalid data section ・Average the data in the valid data section ・Convert to variance and number of terms The facial expression information acquisition unit 13 performs the above-mentioned numerical data washing processing to determine the time of the section regarding the first participant. Generate series numerical data. The section time-series numerical data includes time-series facial expression values, time-series face part coordinates, and time-series line-of-sight coordinates in the effective section.

表情情報取得部１３は、第２の参加者に関する表情情報についても同様の処理を行い、第２の参加者に関する区間時系列数値データを生成する。 The facial expression information acquisition unit 13 performs the same process on the facial expression information regarding the second participant, and generates the section time series numerical data regarding the second participant.

＜表情の検出＞
表情情報取得部１３は、第１の参加者に関する区間時系列数値データを参照して、第１の参加者の表情を表現する複数の第１の指標を算出する。また、表情情報取得部１３は、第２の参加者に関する区間時系列数値データを参照して、第２の参加者の表情を表現する複数の第２の指標を算出する。 <Detection of facial expression>
The facial expression information acquisition unit 13 refers to the section time-series numerical data regarding the first participant to calculate a plurality of first indexes expressing the facial expression of the first participant. In addition, the facial expression information acquisition unit 13 refers to the section time-series numerical data regarding the second participant, and calculates a plurality of second indexes expressing the facial expression of the second participant.

ここで、表情を表現する指標の例には、以下の指標が挙げられる。
・怒り（anger）
・侮辱（contempt）
・嫌悪（disgust）
・恐怖（fear）
・喜び（happiness）
・中立（neutral）
・悲しみ（sadness）
・驚き（surprise）
したがって、表情を表現する指標とは、当該表情が示す感情を表現する指標ということもできる。 Here, examples of the index expressing the facial expression include the following indices.
・Anger
・Insult (contempt)
・Disgust
・Fear
・Happiness
・Neutral
・Sadness
・Surprise
Therefore, the index expressing the facial expression can also be referred to as an index expressing the emotion indicated by the facial expression.

なお、表情情報取得部１３は、第１の参加者に関する区間時系列数値データに含まれる時系列表情値を、そのまま第１の参加者の表情を表現する複数の第１の指標として用いてもよい。同様に、表情情報取得部１３は、第２の参加者に関する区間時系列数値データに含まれる時系列表情値を、そのまま第２の参加者の表情を表現する複数の第２の指標として用いてもよい。 Note that the facial expression information acquisition unit 13 may use the time-series facial expression values included in the section time-series numerical data regarding the first participant, as they are, as a plurality of first indexes expressing the facial expression of the first participant. Good. Similarly, the facial expression information acquisition unit 13 uses the time-series facial expression values included in the section time-series numerical data regarding the second participant, as they are, as a plurality of second indexes expressing the facial expression of the second participant. Good.

また、第１の参加者の表情及び第２の参加者の表情は、上記の指標を各成分とするベクトルとして表現することもできる。こられのベクトルを、表情ベクトルと呼ぶこともある。 The facial expression of the first participant and the facial expression of the second participant can also be expressed as a vector having the above-mentioned index as each component. These vectors are sometimes called facial expression vectors.

なお、各参加者の表情を検出する技術及び検出した表情が示す感情を指標化して表現する技術は、本実施形態を限定するものではなく、例えば、公知の技術を用いることができる。 It should be noted that the technique of detecting the facial expression of each participant and the technique of indexing and expressing the emotion indicated by the detected facial expression do not limit the present embodiment, and a known technique can be used, for example.

＜視線の検出＞
また、表情情報取得部１３は、通信部１１を介して、第１の端末装置２０及び第２の端末装置３０から、第１の参加者及び第２の参加者の視線方向に関する情報を取得する。具体的には、一例として、表情情報取得部１３は、第１の参加者の視線方向に関する情報として、上述した第１の参加者に関する区間時系列数値データに含まれる時系列視線座標を取得する。同様に、表情情報取得部１３は、第２の参加者の視線方向に関する情報として、上述した第２の参加者に関する区間時系列数値データに含まれる時系列視線座標を取得する。 <Detection of line of sight>
In addition, the facial expression information acquisition unit 13 acquires information regarding the line-of-sight directions of the first participant and the second participant from the first terminal device 20 and the second terminal device 30 via the communication unit 11. .. Specifically, as an example, the facial expression information acquisition unit 13 acquires the time-series line-of-sight coordinates included in the section time-series numerical data regarding the first participant described above as the information regarding the line-of-sight direction of the first participant. .. Similarly, the facial expression information acquisition unit 13 acquires the time-series line-of-sight coordinates included in the above-described section time-series numerical data regarding the second participant, as information regarding the line-of-sight direction of the second participant.

なお、視線座標の取得方法としては、特に限定されないが、第１の端末装置２０及び第２の端末装置３０に、点光源（不図示）を設け、点光源からの光の角膜反射像をカメラ２１及びカメラ３１で所定時間撮影することにより、ユーザの視線座標を取得する方法が挙げられる。点光源の種類は特に限定されず、可視光、赤外光が挙げられるが、例えば赤外線ＬＥＤを用いることで、ユーザに不快感を与えることなく、視線座標を取得することができる。 The method of acquiring the line-of-sight coordinates is not particularly limited, but a point light source (not shown) is provided in the first terminal device 20 and the second terminal device 30, and a corneal reflection image of light from the point light source is captured by the camera. For example, a method of acquiring the line-of-sight coordinates of the user by photographing with the camera 21 and the camera 31 for a predetermined time. The type of point light source is not particularly limited, and examples thereof include visible light and infrared light. By using an infrared LED, for example, the line-of-sight coordinates can be acquired without causing discomfort to the user.

＜距離の検出＞
また、表情情報取得部１３は、当該区間時系列数値データに含まれる第１の参加者の時系列顔パーツ座標を取得し、第１の参加者と撮像手段（カメラ２１）との間の距離を算出してもよい。また、表情情報取得部１３は、当該区間時系列数値データに含まれる第２の参加者の時系列顔パーツ座標を取得し、第２の参加者と撮像手段（カメラ３１）との間の距離を算出してもよい。参加者と撮像手段との間の距離は、例えば、顔パーツ座標から得られる撮像画像中の顔の目尻距離を顔角度補正したものを目尻距離とし、この目尻距離の逆数として算出することができる。 <Detection of distance>
The facial expression information acquisition unit 13 also acquires the time-series face part coordinates of the first participant included in the section time-series numerical data, and calculates the distance between the first participant and the imaging unit (camera 21). May be calculated. The facial expression information acquisition unit 13 also acquires the time-series face part coordinates of the second participant included in the section time-series numerical data, and calculates the distance between the second participant and the image capturing unit (camera 31). May be calculated. The distance between the participant and the imaging means can be calculated, for example, as the reciprocal of the outer corner of the eye, which is obtained by correcting the outer corner of the eye in the captured image obtained from the face part coordinates as the outer corner of the eye. ..

（音声情報取得部１４）
音声情報取得部１４は、第１の参加者の発話に関する第１の発話情報と、複数の会議参加者のうち第２の参加者の発話に関する第２の発話情報とを取得する。すなわち、音声情報取得部１４は、通信部１１を介して、第１の端末装置２０及び第２の端末装置３０から第１の参加者及び第２の参加者の発話に関する情報を取得する。 (Voice information acquisition unit 14)
The voice information acquisition unit 14 acquires the first utterance information regarding the utterance of the first participant and the second utterance information regarding the utterance of the second participant among the plurality of conference participants. That is, the voice information acquisition unit 14 acquires information about the utterances of the first participant and the second participant from the first terminal device 20 and the second terminal device 30 via the communication unit 11.

音声情報取得部１４は、一例として、上述した第１の参加者に関する区間時系列テキストデータに含まれる時系列発話テキストを取得する。同様に、音声情報取得部１４は、一例として、上述した第２の参加者に関する区間時系列テキストデータに含まれる時系列発話テキストを取得する。 As an example, the voice information acquisition unit 14 acquires the time-series utterance text included in the section time-series text data regarding the above-mentioned first participant. Similarly, the voice information acquisition unit 14 acquires, as an example, the time-series utterance text included in the section time-series text data regarding the second participant described above.

また、一例として、音声情報取得部１４は、第１の参加者に関する時系列発話テキストと、当該時系列発話テキストの発話の時点における時系列顔パーツ座標を取得する。音声情報取得部１４は、時系列顔パーツ座標を参照して、時系列発話テキストの発話の時点において第１の参加者の口が開いていれば、当該時系列発話テキストを第１の参加者の発話に関する第１の発話情報に含める。同様に、音声情報取得部１４は、第２の参加者に関する時系列発話テキストと、当該時系列発話テキストの発話の時点における時系列顔パーツ座標を取得する。音声情報取得部１４は、時系列顔パーツ座標を参照して、時系列発話テキストの発話の時点において第２の参加者の口が開いていれば、当該時系列発話テキストを第２の参加者の発話に関する第２の発話情報に含める。これにより、マイク２２又はマイク３２として、指向性の無い簡易なマイクを用いた場合でも、発話した人物を特定できる。 Further, as an example, the voice information acquisition unit 14 acquires the time-series utterance text relating to the first participant and the time-series face part coordinates at the time of utterance of the time-series utterance text. The voice information acquisition unit 14 refers to the time-series face part coordinates and, if the first participant's mouth is open at the time of utterance of the time-series utterance text, sets the time-series utterance text to the first participant. Included in the first utterance information related to the utterance. Similarly, the voice information acquisition unit 14 acquires the time-series utterance text regarding the second participant and the time-series face part coordinates at the time of utterance of the time-series utterance text. The voice information acquisition unit 14 refers to the time-series face part coordinates and, if the mouth of the second participant is open at the time of utterance of the time-series utterance text, sets the time-series utterance text to the second participant. Included in the second utterance information regarding the utterance of. Accordingly, even if a simple microphone having no directivity is used as the microphone 22 or the microphone 32, the person who speaks can be specified.

（表情関係性情報生成部１５）
表情関係性情報生成部１５は、第１の表情情報と第２の表情情報とを参照して、第１の参加者と第２の参加者との表情に関する関係性を示す表情関係性情報を生成する。 (Facial expression relationship information generation unit 15)
The facial expression relationship information generation unit 15 refers to the first facial expression information and the second facial expression information to generate facial expression relationship information indicating a facial expression relationship between the first participant and the second participant. To generate.

会議参加者の会議に対する満足度は、会議の内容及び結論のみならず、参加者間の良好なコミュニケーションの有無にも依存する。参加者間のコミュニケーション状態は、会議中の参加者間の関係性により表され、参加者間の関係性は感情の一致度により評価することができる。表情関係性情報生成部１５は、表情情報取得部１３から第１の表情情報及び第２の表情情報を取得し、これらの表情情報を元に、会議中の参加者間の感情の一致度を参加者双方の表情から評価することで、参加者間のコミュニケーション状態をリアルタイムで評価する。 The satisfaction of a conference participant with respect to the conference depends not only on the content and conclusion of the conference, but also on the presence or absence of good communication between the participants. The communication state between the participants is represented by the relationship between the participants during the conference, and the relationship between the participants can be evaluated by the degree of agreement of emotions. The facial expression relationship information generation unit 15 acquires the first facial expression information and the second facial expression information from the facial expression information acquisition unit 13 and, based on these facial expression information, determines the degree of coincidence of emotions among the participants in the conference. By evaluating the facial expressions of both participants, the communication status between the participants is evaluated in real time.

表情関係性情報生成部１５が表情情報取得部１３から取得するそれぞれの表情情報は、区間時系列数値データを元に算出されたものであり、つまり、各参加者のリアルタイム又は経時的な表情に関する情報に基づいて算出されたものである。表情関係性情報生成部１５は、各参加者のリアルタイム又は経時的な表情情報を元に、表情関係性情報を生成するので、生成された表情関係性情報は、参加者間のリアルタイム又は経時的な表情に関する関係性を表している。 The facial expression information acquired by the facial expression relationship information generation unit 15 from the facial expression information acquisition unit 13 is calculated based on the section time-series numerical data, that is, the facial expression of each participant in real time or over time. It is calculated based on information. The facial expression relationship information generation unit 15 generates facial expression relationship information based on the facial expression information of each participant in real time or over time. It expresses the relationship regarding various facial expressions.

＜表情一致率判定＞
上述のように、第１の表情情報には、第１の参加者の表情を表現する複数の第１の指標が含まれており、第２の表情情報には、第２の参加者の表情を表現する複数の第２の指標が含まれている。 <Facial expression matching rate judgment>
As described above, the first facial expression information includes a plurality of first indexes expressing the facial expression of the first participant, and the second facial expression information includes the facial expression of the second participant. A plurality of second indexes expressing the above are included.

表情関係性情報生成部１５は、第１の指標と第２の指標との差に関する表情差分情報を生成し、生成した表情差分情報を、表情関係性情報に含めてもよい。 The facial expression relationship information generation unit 15 may generate facial expression difference information regarding the difference between the first index and the second index, and may include the generated facial expression difference information in the facial expression relationship information.

一例として、表情関係性情報生成部１５は、第１の参加者の表情を表現する複数の指標を要素とする第１の表情ベクトルと、第２の参加者の表情を表現する複数の指標を要素とする第２の表情ベクトルとの差の絶対値を用いて、表情不一致量を算出する。算出された表情不一致量は、参加者間の会議中の感情の融和状態を表す指標とも言える。また、表情関係性情報生成部１５は、会議開始から現時点までに、表情が一致した割合を示す指標として表情一致率を算出してもよい。表情一致率は、例えば、会議開始から現時点までの時間から表情が不一致であった時間を引いて、会議開始から現時点までの時間で除算することによって得られる。 As an example, the facial expression relationship information generating unit 15 displays a first facial expression vector having a plurality of indices expressing the facial expression of the first participant and a plurality of indices expressing the facial expression of the second participant. The expression disagreement amount is calculated using the absolute value of the difference from the second expression vector as an element. It can be said that the calculated facial expression dissimilarity amount is also an index representing the harmony state of emotions during the conference between the participants. In addition, the facial expression relationship information generating unit 15 may calculate the facial expression matching rate as an index indicating the rate of matching facial expressions from the start of the conference to the present time. The facial expression matching rate is obtained, for example, by subtracting the time when the facial expressions do not match from the time from the start of the conference to the present time and dividing by the time from the start of the conference to the present time.

＜視線合致率判定＞
また、第１の表情情報が、第１の参加者の視線方向に関する第１の視線情報を含む構成とし、第２の表情情報が、第２の参加者の視線方向に関する第２の視線情報を含む構成としてもよい。表情関係性情報生成部１５は、第１の視線情報と第２の視線情報とを参照して視線関係性情報を生成し、生成した視線関係性情報を、表情関係性情報に含めてもよい。 <Gaze matching rate determination>
In addition, the first facial expression information is configured to include first line-of-sight information regarding the first participant's line-of-sight direction, and the second facial expression information includes second line-of-sight information regarding the second participant's line-of-sight direction. It may be configured to include. The facial expression relationship information generating unit 15 may generate the visual line relationship information by referring to the first visual line information and the second visual line information, and may include the generated visual line relationship information in the facial expression relationship information. ..

一例として、表情関係性情報生成部１５は、視線関係性情報として、第１の参加者と第２の参加者との視線合致率を算出する。算出された視線合致率は、会議中に他の参加者の様子を気にかけている状態を表す指標とも言える。より具体的には、まず、制御部２４又は表情関係性情報生成部１５が、カメラ２１の撮像画像を解析することにより、会議室における第１の参加者の目の位置を特定し、制御部３４又は表情関係性情報生成部１５が、カメラ３１の撮像画像を解析することにより、会議室における第２の参加者の目の位置を特定する。 As an example, the facial expression relationship information generating unit 15 calculates the line-of-sight matching rate between the first participant and the second participant as the line-of-sight relationship information. It can be said that the calculated line-of-sight match rate is an index indicating a state of being aware of the states of other participants during the conference. More specifically, first, the control unit 24 or the facial expression relationship information generation unit 15 identifies the eye position of the first participant in the conference room by analyzing the image captured by the camera 21, and the control unit 34 or the facial expression relationship information generation unit 15 specifies the eye position of the second participant in the conference room by analyzing the image captured by the camera 31.

そして、表情関係性情報生成部１５は、各時点において、第１の視線情報が示す第１の参加者の視線方向が、第２の参加者の目に向かっているか否かを判定し、第２の視線情報が示す第２の参加者の視線方向が、第１の参加者の目に向かっているか否かを判定することにより、各時点において、第１の参加者の視線と第２の参加者の視線とが合致しているかを判定する。 Then, the facial expression relationship information generation unit 15 determines at each time point whether or not the line-of-sight direction of the first participant indicated by the first line-of-sight information is toward the eye of the second participant, By determining whether or not the line-of-sight direction of the second participant indicated by the line-of-sight information of No. 2 is toward the first participant's eyes, the line-of-sight of the first participant and the second line-of-sight Determine whether the line of sight of the participant matches.

一例として、表情関係性情報生成部１５は、第１の参加者の視線が第２の参加者の目に向かっていると判定した場合に、第１の参加者の視線フラグを１に設定する。また、表情関係性情報生成部１５は、第２の参加者の視線が第１の参加者の目に向かっていると判定した場合に、第２の参加者の視線フラグを１に設定する。そして、表情関係性情報生成部１５は、双方の視線フラグが共に１である場合に、視線が合致していると判定する。 As an example, the facial expression relationship information generation unit 15 sets the line-of-sight flag of the first participant to 1 when it is determined that the line-of-sight of the first participant is facing the eyes of the second participant. .. In addition, the facial expression relationship information generation unit 15 sets the line-of-sight flag of the second participant to 1 when it is determined that the line-of-sight of the second participant is facing the eyes of the first participant. Then, the facial expression relationship information generating unit 15 determines that the line of sight matches when both line of sight flags are 1.

そして、表情関係性情報生成部１５は、会議開始から現時点までに、視線が合致した割合を示す指標として視線合致率を算出する。視線合致率は、例えば、視線が合致した時間を、会議開始から現時点までの時間で除算することによって得られる。 Then, the facial expression relationship information generation unit 15 calculates the line-of-sight matching rate as an index indicating the rate of matching the line-of-sight from the start of the conference to the present time. The line-of-sight matching rate is obtained, for example, by dividing the time at which the line of sight matches the time from the start of the conference to the current time.

なお、視線が互いの目に向かっているか否かの判定には、第１の端末装置２０と第２の端末装置３０との相対的な位置関係を示す位置情報を更に参照する構成としてもよい。 Note that the determination as to whether or not the lines of sight are toward each other's eyes may be configured to further refer to position information indicating a relative positional relationship between the first terminal device 20 and the second terminal device 30. ..

また、互いの視線が必ずしも相手の目ではなく、相手の顔又は相手の身体の方向を向いている場合に、視線が合致していると判定する構成としてもよい。 Further, when the lines of sight of each other are not necessarily the eyes of the other party but face the direction of the other party's face or the body of the other party, it may be determined that the lines of sight match.

また、参加者がインターネット等を介して会議する場合には、端末装置の画面を通した参加者間の視線合致率を算出する。より具体的には、一例として、第１の端末装置２０の表示画面に表示される第２の参加者の顔の位置を、当該表示画面上の座標として特定し、特定した座標に対して第１の参加者の視線が向けられている場合に、第１の参加者の視線フラグを１に設定する。同様に、第２の端末装置３０の表示画面に表示される第１の参加者の顔の位置を、当該表示画面上の座標として特定し、特定した座標に対して第２の参加者の視線が向けられている場合に、第２の参加者の視線フラグを１に設定する。 Further, when the participants have a meeting via the Internet or the like, the line-of-sight matching rate between the participants on the screen of the terminal device is calculated. More specifically, as an example, the position of the face of the second participant displayed on the display screen of the first terminal device 20 is specified as the coordinates on the display screen, and the position of the second participant is compared with the specified coordinates. When the line of sight of one participant is directed, the line-of-sight flag of the first participant is set to 1. Similarly, the position of the face of the first participant displayed on the display screen of the second terminal device 30 is specified as the coordinates on the display screen, and the line of sight of the second participant with respect to the specified coordinates. If is directed, the line-of-sight flag of the second participant is set to 1.

＜前のめり率判定＞
また、表情関係性情報生成部１５は、第１の参加者と第２の参加者との前のめり率を算出し、算出した前のめり率を表情関係性情報に含めてもよい。算出された前のめり率は、会議中に他の参加者の発話に興味を示している状態を表す指標とも言える。一例として、表情関係性情報生成部１５は、第１の参加者及び第２の参加者の、それぞれの撮像手段からの距離が、予め設定された一定時間内においてしきい値よりも下回った場合に、第１の参加者及び第２の参加者が前のめり状態であると判定する。 <Judgment rate before>
In addition, the facial expression relationship information generation unit 15 may calculate the previous turnover rate of the first participant and the second participant, and may include the calculated previous turnover rate in the facial expression relationship information. It can be said that the calculated turn-over rate is an index representing a state in which the participant is interested in the speech of another participant during the conference. As an example, when the distance between the first participant and the second participant from the respective image capturing means is less than a threshold value within a preset fixed time, the facial expression relationship information generating unit 15 First, it is determined that the first participant and the second participant are in the forward leaning state.

そして、表情関係性情報生成部１５は、会議開始から現時点までの時間において、第１の参加者が前のめりになっている時間の割合を、第１の参加者に関する前のめり率として特定し、第２の参加者が前のめりになっている時間の割合を、第２の参加者に関する前のめり率として特定する。 Then, the facial expression relationship information generating unit 15 identifies the ratio of the time when the first participant is leaning forward in the time from the start of the conference to the present time, as the previous leaning rate regarding the first participant, and the second The percentage of time that the participants in the previous turn are in the previous turn is identified as the previous turn rate for the second participant.

また、表情関係性情報生成部１５は、第１の参加者及び第２の参加者それぞれの撮像手段からの距離を元に得られる顔画像サイズについて、予め設定された一定時間内の変化から会議中の参加者の姿勢を算出し、表情関係性情報に含めてもよい。算出された参加者の姿勢は、会議中に他の参加者の発話を聞くにふさわしい態度を表す指標とも言える。 In addition, the facial expression relationship information generation unit 15 determines the face image size obtained based on the distances from the image capturing means of the first participant and the second participant from the change within a preset fixed time to the conference. The posture of the inside participant may be calculated and included in the facial expression relationship information. It can be said that the calculated attitude of the participant is an index representing an attitude suitable for listening to the utterances of other participants during the conference.

さらに、表情関係性情報生成部１５は、第１の参加者の姿勢の変化と第２の参加者の第２の表情ベクトルの変化との相関を算出し、その相関を表情関係性情報に含めてもよい。姿勢の変化と表情ベクトルの変化との相関は、一の参加者の姿勢が他の参加者の表情に及ぼす影響を表す指標とも言える。同様に、表情関係性情報生成部１５は、第２の参加者の姿勢の変化と第１の参加者の第１の表情ベクトルの変化との相関を算出し、その相関を表情関係性情報に含めてもよい。 Further, the facial expression relationship information generating unit 15 calculates a correlation between the change in the posture of the first participant and the change in the second facial expression vector of the second participant, and includes the correlation in the facial expression relationship information. May be. The correlation between the change in posture and the change in facial expression vector can be said to be an index representing the influence of the posture of one participant on the facial expressions of other participants. Similarly, the facial expression relationship information generating unit 15 calculates the correlation between the change in the posture of the second participant and the change in the first facial expression vector of the first participant, and uses the correlation as the facial expression relationship information. May be included.

また、表情関係性情報生成部１５は、第１の参加者の姿勢と第２の参加者の姿勢とを参照して、第１の参加者と第２の参加者との姿勢状態の類似度を算出し、算出した類似度を表情関係性情報に含めてもよい。姿勢状態の類似度は、ミラーリング状態を表しており、会議中に他の参加者の発話に興味を示している状態を表す指標とも言える。 In addition, the facial expression relationship information generation unit 15 refers to the postures of the first participant and the postures of the second participant, and the degree of similarity between the postures of the first participant and the second participant. May be calculated and the calculated similarity may be included in the facial expression relationship information. The similarity of the posture states represents the mirroring state, and can be said to be an index indicating a state in which the participant is interested in the speech of another participant during the conference.

なお、表情関係性情報生成部１５は、第１及び第２の参加者の属性を示す参加者情報を更に参照して、表情関係性情報を生成してもよい。参加者の属性を示す参加者情報は、当該参加者の年齢、性別、血液型、性格、出身地、家族関係、役職、勤続年数、転職回数、職務履歴等の少なくとも何れかを含む。また、参加者情報には、当該システムの利用履歴も含まれる。 The facial expression relationship information generating unit 15 may generate facial expression relationship information by further referring to the participant information indicating the attributes of the first and second participants. Participant information indicating the attributes of the participant includes at least one of age, sex, blood type, personality, place of origin, family relationship, job title, years of service, number of job changes, job history, etc. of the participant. The participant information also includes the usage history of the system.

一例として、表情関係性情報生成部１５は、参加者情報を参照し、当該参加者が特定の表情が出やすいと判断した場合には、当該特定の表情に対応する指標に１より小さい重み係数を乗算する補正を行うことによって当該参加者の表情ベクトルを補正し、補正後の表情ベクトルを用いて表情関係性情報を生成してもよい。
例えば、第１の参加者の属性を示す参加者情報が、当該第１の参加者が内気であることを示している場合、表情関係性情報生成部１５は、「中立（neutral）」の指標に対して重み０．８を乗算し、残り０．２の重みを他の指標に比例配分する等の処理を行うことによって、当該第１の参加者の表情ベクトルを補正し、補正後の表情ベクトルを用いて表情関係性情報を生成する構成としてもよい。 As an example, the facial expression relationship information generation unit 15 refers to the participant information, and when the participant determines that the particular facial expression is likely to appear, the weighting coefficient smaller than 1 is set to the index corresponding to the specific facial expression. The facial expression vector of the participant may be corrected by performing a correction by multiplying by, and the facial expression relationship information may be generated using the corrected facial expression vector.
For example, when the participant information indicating the attribute of the first participant indicates that the first participant is shy, the facial expression relationship information generation unit 15 determines that the index is “neutral”. Is multiplied by a weight of 0.8, and the remaining 0.2 weight is proportionally distributed to another index, thereby correcting the facial expression vector of the first participant and the corrected facial expression. The facial expression relationship information may be generated using a vector.

情報処理装置１０は、参加者の脈波、脳波等の生体情報と、参加者周囲の温度、湿度、二酸化炭素濃度、照度等の環境情報とをさらに取得する構成とし、表情関係性情報生成部１５は、生体情報及び環境情報を更に参照して、表情関係性情報を生成してもよい。 The information processing device 10 is configured to further acquire biometric information such as a participant's pulse wave and brain wave, and environmental information such as temperature, humidity, carbon dioxide concentration, and illuminance around the participant, and a facial expression relationship information generation unit. The reference numeral 15 may generate facial expression relationship information by further referring to biometric information and environment information.

一例として、表情関係性情報生成部１５は、第１の参加者の脈波又は呼吸から判定した参加者のストレス状態と、その直前又はその時点における第２の参加者の表情を表現する第２の指標を参照し、第１の参加者にストレスを与える第２の参加者の表情を推定する。そして、表情関係性情報生成部１５は、推定された第２の参加者の表情を、第１の参加者に対するＮＧ表情と認定し、その情報を表情関係性情報に含めてもよい。一の参加者の他の参加者に対するＮＧ表情は、一の参加者の表情が他の参加者のストレス状態に及ぼす影響を表す指標とも言える。同様に、表情関係性情報生成部１５は、第２の参加者にストレスを与える第１の参加者の表情を推定し、第２の参加者に対するＮＧ表情を認定してもよい。 As an example, the facial expression relationship information generating unit 15 expresses the stress state of the participant determined from the pulse wave or respiration of the first participant, and the facial expression of the second participant immediately before or at that time. The facial expression of the second participant that stresses the first participant is estimated with reference to the index of. Then, the facial expression relationship information generating unit 15 may recognize the estimated facial expression of the second participant as an NG facial expression for the first participant and include the information in the facial expression relationship information. The NG facial expression of one participant with respect to another participant can be said to be an index representing the effect of the facial expression of one participant on the stress state of another participant. Similarly, the facial expression relationship information generation unit 15 may estimate the facial expression of the first participant that gives stress to the second participant, and recognize the NG facial expression for the second participant.

また、表情関係性情報生成部１５は、予め定められた一定期間内の、参加者の周囲の環境情報の変化と、第１の参加者の第１の表情ベクトル及び第２の参加者の第２の表情ベクトルの平均値の変化との相関を算出し、その相関を表情関係性情報に含めてもよい。環境情報の変化と表情ベクトルの平均値の変化との相関は、参加者の周囲の環境が参加者間のコミュニケーション状態に及ぼす影響を表す指標とも言える。 In addition, the facial expression relationship information generating unit 15 changes the environmental information around the participant, the first facial expression vector of the first participant, and the second participant's first facial expression vector within a predetermined period. The correlation with the change in the average value of the facial expression vector 2 may be calculated and the correlation may be included in the facial expression relationship information. The correlation between the change in environmental information and the change in the average value of facial expression vectors can be said to be an index representing the influence of the surrounding environment of the participants on the communication state between the participants.

＜対話管理処理＞
表情関係性情報生成部１５は、一例として、通信部１１を介して、第１の参加者及び第２の参加者それぞれのユーザＩＤと、当該ユーザＩＤが表す参加者が会議開始した時刻及び終了した時刻を表すタイムスタンプとを参照し、対話管理処理を行う。表情関係性情報生成部１５は、一の参加者に関する区間時系列数値データのある時点のデータについて、その時点に対話している他の参加者のユーザＩＤを抽出し、どの参加者と対話中に得られたデータであるかを判定して、結果を表情関係性情報に含めてもよい。 <Dialogue management processing>
As an example, the facial expression relationship information generation unit 15 uses, via the communication unit 11, the user IDs of the first participant and the second participant, the time at which the participant represented by the user ID starts and the end of the meeting. The dialogue management processing is performed by referring to the time stamp indicating the time when the dialogue was performed. The facial expression relationship information generation unit 15 extracts the user IDs of the other participants who are interacting at that time point with respect to the data at a certain time point of the section time-series numerical data regarding one participant, and in which participant It is also possible to determine whether or not the data is obtained from the above and include the result in the facial expression relationship information.

（発話関係性情報生成部１６）
発話関係性情報生成部１６は、第１の発話情報と第２の発話情報とを参照して、第１の参加者と第２の参加者との発話に関する関係性を示す発話関係性情報を生成する。発話関係性情報生成部１６は、音声情報取得部１４から第１の発話情報及び第２の発話情報を取得し、これらの発話情報を元に、会議中の参加者間の感情の一致度を参加者双方の発話から評価することで、参加者間のコミュニケーション状態を評価する。 (Utterance relationship information generation unit 16)
The utterance relationship information generation unit 16 refers to the first utterance information and the second utterance information to generate utterance relationship information indicating a relationship regarding utterance between the first participant and the second participant. To generate. The utterance relationship information generation unit 16 acquires the first utterance information and the second utterance information from the voice information acquisition unit 14, and based on these utterance information, determines the degree of coincidence of emotions among the participants in the conference. The communication state between participants is evaluated by evaluating the utterances of both participants.

発話関係性情報生成部１６が音声情報取得部１４から取得するそれぞれの発話情報は、区間時系列テキストデータを元に算出されたものであり、つまり、各参加者のリアルタイム又は経時的な発話に関する情報に基づいて算出されたものである。発話関係性情報生成部１６は、各参加者のリアルタイム又は経時的な発話情報を元に、発話関係性情報を生成するので、生成された発話関係性情報は、参加者間のリアルタイム又は経時的な発話に関する関係性を表している。 Each utterance information acquired by the utterance relationship information generating unit 16 from the voice information acquiring unit 14 is calculated based on the section time-series text data, that is, relating to the utterance of each participant in real time or over time. It is calculated based on information. The utterance relationship information generation unit 16 generates the utterance relationship information based on the utterance information of each participant in real time or over time. It represents a relationship related to different utterances.

＜発話比率判定＞
発話関係性情報生成部１６は、第１の発話情報が示す第１の参加者の発話時間と、第２の発話情報が示す第２の参加者の発話時間との関係を示す発話時間関係性情報を生成し、生成した発話時間関係性情報を、発話関係性情報に含めてもよい。 <Speech ratio judgment>
The utterance relationship information generating unit 16 is an utterance time relationship indicating a relationship between the utterance time of the first participant indicated by the first utterance information and the utterance time of the second participant indicated by the second utterance information. Information may be generated, and the generated utterance time relationship information may be included in the utterance relationship information.

一例として、発話関係性情報生成部１６は予め定められた一定時間内の、第１の参加者の発話時間と第２の参加者の発話時間との発話比率を算出し、発話関係性情報に含める。算出された発話比率は、参加者間の関係の対等性を表す指標とも言える。 As an example, the utterance relationship information generating unit 16 calculates the utterance ratio between the utterance time of the first participant and the utterance time of the second participant within a predetermined fixed time, and calculates the utterance relationship information. include. It can be said that the calculated utterance ratio is also an index showing the equality of the relationship between the participants.

＜発話頻度判定＞
また、発話関係性情報生成部１６は、第１の発話情報及び第２の発話情報の少なくとも何れかに、特定のカテゴリーに含まれる発話内容が含まれているか否かを判定し、判定した結果に応じた情報を前記発話関係性情報に含めてもよい。 <Speech frequency judgment>
Further, the utterance relationship information generating unit 16 determines whether or not at least one of the first utterance information and the second utterance information includes utterance content included in a specific category, and the determination result May be included in the utterance relationship information.

発話内容に含まれる特定のカテゴリーの例には、オープンクエスチョン、行動促しワード（それで？、なるほど、確かに）、オウム返し、発話の遮り、発話の被り、否定ワード（でも、だけど）が含まれる。一例として、発話関係性情報生成部１６は、予め定められた一定時間内における、このような特定のカテゴリーに含まれる発話内容が発話された頻度を算出し、算出した頻度に関する情報を発話関係性情報に含める。 Examples of specific categories included in utterances include open questions, action words (so?, I see, sure), parrots, utterance interruptions, utterance omissions, negative words (but but). .. As an example, the utterance relationship information generating unit 16 calculates the frequency of utterance of the utterance content included in such a specific category within a predetermined fixed time period, and outputs information regarding the calculated frequency as the utterance relationship. Include in information.

具体的には、一例として、発話内容に含まれる特定カテゴリーをオープンクエスチョンと設定し、発話関係性情報生成部１６は、第１の参加者の区間時系列テキストデータから、一定時間内に含まれるオープンクエスチョンを表すテキストデータを抽出する。そして、発話関係性情報生成部１６は、抽出されたテキストデータの単語数を上記一定時間内の全テキストデータの単語数で除算することによって、オープンクエスチョンが発話された頻度をオープン質問率として算出する。同様に、発話関係性情報生成部１６は、第２の参加者の区間時系列テキストデータから、オープン質問率を算出する。そして、発話関係性情報生成部１６は、第１の参加者のオープン質問率と第２の参加者のオープン質問率とを比較して、オープン質問比率を算出し、発話関係性情報に含める。算出されたオープン質問比率は、参加者間の関係の対等性を表す指標とも言える。また、発話関係性情報生成部１６は、第１の参加者のオープン質問率及び第２の参加者のオープン質問率を、発話関係性情報に含めてもよい。 Specifically, as an example, the specific category included in the utterance content is set as an open question, and the utterance relationship information generation unit 16 includes the categorized time category text data of the first participant within a certain period of time. Extract text data that represents an open question. Then, the utterance relationship information generating unit 16 calculates the frequency of utterances of the open question as an open question rate by dividing the number of words of the extracted text data by the number of words of all the text data within the fixed time. To do. Similarly, the utterance relationship information generating unit 16 calculates the open question rate from the section time series text data of the second participant. Then, the utterance relationship information generating unit 16 compares the open question rate of the first participant with the open question rate of the second participant, calculates the open question rate, and includes it in the utterance relationship information. It can be said that the calculated open question ratio is an index showing the equality of the relationship between the participants. Further, the utterance relationship information generation unit 16 may include the open question rate of the first participant and the open question rate of the second participant in the utterance relationship information.

同様に、発話内容に含まれる特定カテゴリーを行動促しワードと設定し、発話関係性情報生成部１６は、第１の参加者の区間時系列テキストデータから、一定時間内に含まれる行動促しワードを表すテキストデータを抽出する。そして、発話関係性情報生成部１６は、抽出されたテキストデータの単語数を上記一定時間内の全テキストデータの単語数で除算することによって、行動促しワードが発話された頻度を促し質問率として算出する。
同様に、発話関係性情報生成部１６は、第２の参加者の区間時系列テキストデータから、促し質問率を算出する。そして、発話関係性情報生成部１６は、第１の参加者の促し質問率と第２の参加者の促し質問率とを比較して、促し質問比率を算出し、発話関係性情報に含める。算出された促し質問比率は、参加者間の関係の対等性を表す指標とも言える。また、発話関係性情報生成部１６は、第１の参加者の促し質問率及び第２の参加者の促し質問率を、発話関係性情報に含めてもよい。 Similarly, the specific category included in the utterance content is set as the action prompting word, and the utterance relationship information generating unit 16 sets the action prompting word included within the fixed time from the section time series text data of the first participant. Extract the text data to represent. Then, the utterance relationship information generating unit 16 divides the number of words of the extracted text data by the number of words of all the text data within the above-mentioned fixed time to prompt the action and the frequency of the utterance of the words to obtain the question rate. calculate.
Similarly, the utterance relationship information generating unit 16 calculates the prompting question rate from the section time-series text data of the second participant. Then, the utterance relationship information generation unit 16 compares the prompted question rate of the first participant with the prompted question rate of the second participant, calculates the prompted question rate, and includes it in the spoken relationship information. The calculated prompting question ratio can be said to be an index showing the equality of the relationship between the participants. Further, the utterance relationship information generating unit 16 may include the urging question rate of the first participant and the urging question rate of the second participant in the utterance relationship information.

＜単語に基づく評価＞
また、発話関係性情報生成部１６は、第１の発話情報及び第２の発話情報の少なくとも何れかから、所定時間内において相対的に出現頻度の高い単語を抽出し、抽出した単語を発話関係性情報に含めてもよい。 <Word-based evaluation>
Further, the utterance relation information generating unit 16 extracts a word having a relatively high appearance frequency within a predetermined time from at least one of the first utterance information and the second utterance information, and the extracted word is the utterance relation. May be included in sex information.

一例として、発話関係性情報生成部１６は、第１の参加者及び第２の参加者のそれぞれの区間時系列テキストデータから、予め定められた一定時間内に含まれる各単語の出現数を参加者毎に算出して順位付けし、相対的に出現頻度の高い単語を上位から複数抽出する。そして、発話関係性情報生成部１６は、抽出した出現頻度の上位の単語を頻出単語として発話関係性情報に含める。また、発話関係性情報生成部１６は、第１の参加者及び第２の参加者のそれぞれの頻出単語及びその順位が一致しているかを判定し、その判定結果を発話関係性情報に含めてもよい。 As an example, the utterance relationship information generating unit 16 participates in the number of appearances of each word included within a predetermined fixed time from the section time series text data of each of the first participant and the second participant. The words are calculated and ranked for each person, and a plurality of words having a relatively high appearance frequency are extracted from the top. Then, the utterance relationship information generation unit 16 includes the extracted words with the highest appearance frequency in the utterance relationship information as frequent words. In addition, the utterance relationship information generating unit 16 determines whether or not the frequent words of the first participant and the second participant and their ranks match, and includes the determination result in the utterance relationship information. Good.

また、発話関係性情報生成部１６は、第１の参加者及び第２の参加者のそれぞれの区間時系列テキストデータから、予め定められた一定時間内に含まれる単語を参加者毎に抽出し、抽出された単語の一致率を算出して、発話関係性情報に含めてもよい。単語の一致率は、予め定められた一定時間内の区間時系列テキストデータに含まれる全単語中における、第１の参加者と第２の参加者とで一致した単語の比率として算出することができる。算出された単語の一致率は、オウム返しができているかの指標とも言える。 Further, the utterance relationship information generating unit 16 extracts, for each participant, a word included within a predetermined fixed time from the section time series text data of each of the first participant and the second participant. The matching rate of the extracted words may be calculated and included in the utterance relationship information. The word matching rate can be calculated as a ratio of words matched by the first participant and the second participant in all the words included in the section time-series text data within a predetermined fixed time. it can. The calculated word concordance rate can be said to be an index of whether parrots have been successfully returned.

また、発話関係性情報生成部１６は、第１の参加者及び第２の参加者のそれぞれの区間時系列テキストデータから発話タイミングを抽出し、発話タイミングの時間的なオーバーラップを算出してもよい。そして、発話関係性情報生成部１６は、予め定められた一定時間内のオーバーラップの回数をオーバーラップ頻度として算出し、発話関係性情報に含めてもよい。算出したオーバーラップ頻度は、他の参加者の発話をさえぎる頻度を表す指標とも言える。 In addition, the utterance relationship information generating unit 16 extracts the utterance timing from the section time series text data of each of the first participant and the second participant, and calculates the temporal overlap of the utterance timing. Good. Then, the utterance relationship information generating unit 16 may calculate the number of overlaps within a predetermined fixed time as the overlap frequency and include it in the utterance relationship information. The calculated overlap frequency can be said to be an index indicating the frequency of interrupting the utterances of other participants.

さらに、発話関係性情報生成部１６は、第１及び第２の参加者の属性を示す参加者情報を更に参照して、発話関係性情報を生成してもよい。参加者の属性を示す参加者情報は、当該参加者の年齢、性別、血液型、性格、出身地、家族関係、役職、勤続年数、転職回数、職務履歴等の少なくとも何れかを含む。また、参加者情報には、当該システムの利用履歴も含まれる。 Further, the utterance relationship information generating unit 16 may generate the utterance relationship information by further referring to the participant information indicating the attributes of the first and second participants. Participant information indicating the attributes of the participant includes at least one of age, sex, blood type, personality, place of origin, family relationship, job title, years of service, number of job changes, job history, etc. of the participant. The participant information also includes the usage history of the system.

情報処理装置１０は、参加者の脈波、脳波等の生体情報と、参加者周囲の温度、湿度、二酸化炭素濃度、照度等の環境情報とをさらに取得する構成とし、発話関係性情報生成部１６は、生体情報及び環境情報を更に参照して、発話関係性情報を生成してもよい。 The information processing apparatus 10 is configured to further acquire biological information such as the participant's pulse wave and brain wave, and environmental information such as the temperature, humidity, carbon dioxide concentration, and illuminance around the participant, and the utterance relationship information generating unit. 16 may further generate the utterance relationship information by further referring to the biometric information and the environment information.

また、一例として、発話関係性情報生成部１６は、第１の参加者の脈波又は呼吸から判定した参加者のストレス状態と、その直前又はその時点における第２の参加者の区間時系列テキストデータとを参照し、第１の参加者にストレスを与える第２の参加者のテキストデータを推定する。そして、発話関係性情報生成部１６は、推定された第２の参加者のテキストデータを、第１の参加者に対するＮＧワードと認定し、その情報を発話関係性情報に含めてもよい。一の参加者の他の参加者に対するＮＧワードは、一の参加者の発言が他の参加者のストレス状態に及ぼす影響を表す指標とも言える。同様に、発話関係性情報生成部１６は、第２の参加者にストレスを与える第１の参加者のテキストデータを推定し、第２の参加者に対するＮＧワードを認定してもよい。 Further, as an example, the utterance relationship information generating unit 16 determines the stress state of the participant determined from the pulse wave or respiration of the first participant, and the section time series text of the second participant immediately before or at that time. And the text data of the second participant that stresses the first participant is estimated. Then, the utterance relationship information generating unit 16 may recognize the estimated text data of the second participant as an NG word for the first participant and include the information in the utterance relationship information. The NG word of one participant with respect to another participant can also be said to be an index representing the influence of the speech of one participant on the stress state of another participant. Similarly, the utterance relationship information generating unit 16 may estimate the text data of the first participant who gives stress to the second participant, and may recognize the NG word for the second participant.

また、一例として、発話関係性情報生成部１６は、第１の参加者の脳波から判定した参加者の思考の活性度と、その直前又はその時点における第２の参加者の区間時系列テキストデータとを参照し、第１の参加者の思考を活性化させる第２の参加者のテキストデータを推定する。そして、発話関係性情報生成部１６は、推定された第２の参加者のテキストデータを、第１の参加者に対する重要ワードと認定し、その情報を発話関係性情報に含めてもよい。一の参加者の他の参加者に対する重要ワードは、一の参加者の発言が他の参加者の思考の活性化に及ぼす影響を表す指標とも言える。同様に、発話関係性情報生成部１６は、第２の参加者の思考を活性化する第１の参加者のテキストデータを推定し、第２の参加者に対する重要ワードを認定してもよい。 Further, as an example, the utterance relationship information generating unit 16 determines the activity level of the participant's thought determined from the electroencephalogram of the first participant and the section time series text data of the second participant immediately before or at that time. With reference to, the text data of the second participant that activates the thought of the first participant is estimated. Then, the utterance relationship information generating unit 16 may recognize the estimated text data of the second participant as an important word for the first participant and include the information in the utterance relationship information. It can be said that the important word of one participant with respect to another participant is an index showing the influence of the speech of one participant on the activation of thinking of other participants. Similarly, the utterance relationship information generating unit 16 may estimate the text data of the first participant that activates the thinking of the second participant, and may recognize the important word for the second participant.

また、発話関係性情報生成部１６は、参加者間の声のトーンの一致度、発話スピードの一致度、音量の一致度等を算出してもよい。 Further, the utterance relationship information generating unit 16 may calculate the degree of coincidence of voice tones among the participants, the degree of coincidence of speech speed, the degree of coincidence of volume, and the like.

さらに、発話関係性情報生成部１６は、蓄積された区間時系列テキストデータから、第１の参加者及び第２の参加者それぞれの発話を表すログを抽出し、その形態素解析データを取得して、過去の発話中の頻出単語をリストアップし、発話関係性情報に含めてもよい。過去の発話中の頻出単語は、会議中の参加者双方に提示することで、会議のテーマ決定を支援するために利用され得る。 Further, the utterance relationship information generating unit 16 extracts logs representing utterances of the first participant and the second participant from the accumulated section time-series text data, acquires the morphological analysis data thereof. , Frequent words in the past utterance may be listed and included in the utterance relationship information. The frequently-used words in the past utterances can be used to assist the decision of the theme of the conference by presenting them to both participants in the conference.

＜対話管理処理＞
発話関係性情報生成部１６は、一例として、通信部１１を介して、第１の参加者及び第２の参加者それぞれのユーザＩＤと、当該ユーザＩＤが表す参加者が会議開始した時刻及び終了した時刻を表すタイムスタンプとを参照し、対話管理処理を行う。発話関係性情報生成部１６は、一の参加者に関する区間時系列テキストデータのある時点のデータについて、その時点に対話している他の参加者のユーザＩＤを抽出し、どの参加者と対話中に得られたデータであるかを判定して、結果を発話関係性情報に含めてもよい。 <Dialogue management processing>
As an example, the utterance relationship information generating unit 16 uses, via the communication unit 11, the user IDs of the first participant and the second participant, the time when the participant represented by the user ID starts, and the end time of the meeting. The dialogue management processing is performed by referring to the time stamp indicating the time when the dialogue was performed. The utterance relationship information generating unit 16 extracts the user IDs of other participants who are interacting at that time with respect to the data at a certain time of the section time-series text data regarding one participant, and which participant is talking It is also possible to determine whether or not the data is obtained from the above and include the result in the utterance relationship information.

（関係性情報生成部１７）
関係性情報生成部１７は、表情関係性情報と前記発話関係性情報とを参照して前記第１の参加者と前記第２の参加者との関係を示すリアルタイム又は経時的な情報である関係性情報を生成する。会議中の参加者双方の表情及び発話の両方を評価することで、参加者間のコミュニケーション状態をより詳細に評価することができる。また、関係性情報生成部１７は、参加者間のリアルタイム又は経時的な表情情報及び発話情報を元に関係性情報を生成するので、参加者間のリアルタイム又は経時的なコミュニケーション状態を評価することができる。 (Relationship information generating unit 17)
The relationship information generation unit 17 refers to the facial expression relationship information and the utterance relationship information, and is real-time or time-dependent information indicating a relationship between the first participant and the second participant. Generate sex information. By evaluating both facial expressions and utterances of both participants during the conference, the communication state between the participants can be evaluated in more detail. Further, since the relationship information generation unit 17 generates the relationship information based on the facial expression information and the utterance information between the participants in real time or over time, it is necessary to evaluate the communication state between the participants in real time or over time. You can

関係性情報生成部１７は、第１の参加者及び第２の参加者の少なくとも何れかに提示する提示情報を生成してもよい。提示情報には、表情関係性情報と発話関係性情報とに基づき総合的に評価した参加者双方の感情の一致度等が含まれていてもよい（例えば、視線合致率が高く、発話比率が対等であれば感情の一致度を高くする等）。 The relationship information generation unit 17 may generate presentation information to be presented to at least one of the first participant and the second participant. The presentation information may include the degree of coincidence between the emotions of both participants, which is comprehensively evaluated based on the facial expression relationship information and the utterance relationship information (for example, the line-of-sight matching rate is high, and the utterance ratio is high). If they are equal, increase the degree of agreement of emotions).

関係性情報生成部１７が生成した提示情報を参加者に提示することで、参加者間の関係性を参加者にフィードバックすることができる。提示情報をリアルタイムで参加者に提示すれば、会話中にリアルタイムで関係性を確認することができるので、リアルタイムでコミュニケーションの改善を促すことも可能である。 By presenting the presentation information generated by the relationship information generation unit 17 to the participants, the relationships between the participants can be fed back to the participants. By presenting the presentation information to the participants in real time, it is possible to confirm the relationship in real time during the conversation, and thus it is possible to prompt the improvement of communication in real time.

提示情報は、第１の参加者及び第２の参加者の双方に提示するものであってもよいし、いずれか一方に提示するものであってもよい。また、関係性情報は、第１の参加者及び第２の参加者に同じ内容を提示するものであってもよいし、異なる内容を提示するものであってもよい。第１の参加者及び第２の参加者に同じ内容を提示する関係性情報を生成することで、参加者間のフラットな関係性の構築が期待できる。また、提示情報を参加者自身が選択できるようになっていてもよいし、ルール又は参加者間の合意により提示される提示情報が変更されてもよい。 The presentation information may be presented to both the first participant and the second participant, or may be presented to either one. Further, the relationship information may present the same content to the first participant and the second participant, or may present different content. By generating the relationship information that presents the same content to the first participant and the second participant, it can be expected to build a flat relationship between the participants. The participants may be allowed to select the presentation information, or the presentation information presented may be changed according to a rule or agreement between the participants.

関係性情報には、第１の参加者の発話時間と、第２の参加者の発話時間との割合を示す情報、及び、第１の参加者の視線方向と、第２の参加者の視線方向との合致率の経時変化に関する情報が含まれていてもよい。また、関係性情報には、表情一致率又は表情不一致率の経時変化、前のめり率、発話内容のテキスト、頻出単語等に関する情報が含まれていてもよい。さらに、関係性情報には、参加者のＩＤ、参加者自身の顔画像、他の参加者の表情を表すアバター画像、発話内容に基づき蓄積データから抽出した推奨議題又は推奨ワードの表示等が含まれていてもよい。 The relationship information includes information indicating the ratio of the utterance time of the first participant and the utterance time of the second participant, the line-of-sight direction of the first participant, and the line-of-sight of the second participant. Information about the change over time in the matching rate with the direction may be included. In addition, the relationship information may include information regarding changes in the facial expression matching rate or the facial expression mismatch rate over time, the previous turnover rate, the text of the utterance content, the frequent words, and the like. Further, the relationship information includes the participant's ID, the participant's own facial image, the avatar image showing the facial expressions of other participants, and the display of the recommended agenda or recommended words extracted from the accumulated data based on the utterance content. It may be.

また、提示情報に、会議参加者のコミュニケーションスキルを向上させるための評価結果を含めてもよい。一例として、表情一致率と共に、表情一致率を高めることで反射的傾聴スキルが向上させることを促す情報を提示したり、視線合致率と共に、視線合致率を高めることでコミュニケーションに適した姿勢、態度を取るように促す情報を提示したりしてもよい。また、推奨するワードや質問内容を提示して、対話レベル及び質問レベルの控除を促してもよい。 Further, the presentation information may include an evaluation result for improving the communication skills of the conference participants. As an example, along with the facial expression matching rate, information that encourages improvement of reflexive listening skills by increasing the facial expression matching rate is presented, and a posture and attitude suitable for communication by increasing the gaze matching rate together with the gaze matching rate. You may provide the information that encourages you to take it. In addition, recommended words and question contents may be presented to encourage deduction of the dialogue level and question level.

提示情報を提示する方法は、具体的には、一例として、会議参加者の表示部（表示部２３及び表示部３３のそれぞれ）に表示する方法、会議参加者全員が視認できる共通の表示部に表示する方法、ネットワーク配信等により会議参加者以外にも提示する方法、腕時計型デバイスのようなウェアラブルデバイスからの物理的な作用（振動、電気刺激等）により提示する方法、環境設備（証明、空調、音楽等）からの物理的な作用（議論が白熱した場合に部屋を赤く照らす等）により提示する方法、感情を表す指標に対応した画像イメージ（怒りを表す火山の噴火等）により提示する方法、感情を表す指標に対応したアバターの表情により提示する方法等が挙げられる。 Specifically, as a method of presenting the presentation information, for example, a method of displaying the information on the display unit (each of the display unit 23 and the display unit 33) of the conference participants and a common display unit that can be visually recognized by all the conference participants. Display method, method of presenting other than conference participants by network distribution, method of presenting by physical action (vibration, electrical stimulation, etc.) from a wearable device such as a wristwatch type device, environmental equipment (certification, air conditioning) , Music, etc.) to present by a physical action (such as illuminating the room red when the discussion is heated), a method of presenting by an image image corresponding to an index indicating emotion (volcanic eruption indicating anger, etc.) , A method of presenting with an avatar's facial expression corresponding to an index indicating emotions, and the like.

図４及び５を参照して、表示部２３及び表示部３３の少なくとも一方に提示情報を提示される画面例を説明する。図４は、本発明の一実施形態に係る情報処理装置を含む情報処理システムが提示する情報の一例を示す図であり、図５は、本発明の一実施形態に係る情報処理装置を含む情報処理システムが提示する情報の他の例を示す図である。 An example of a screen in which presentation information is presented on at least one of the display unit 23 and the display unit 33 will be described with reference to FIGS. 4 and 5. FIG. 4 is a diagram showing an example of information presented by an information processing system including an information processing apparatus according to an embodiment of the present invention, and FIG. 5 is information including an information processing apparatus according to an embodiment of the present invention. It is a figure which shows the other example of the information which a processing system presents.

図４に示すように、画面４００において、領域４０１に会議参加者のユーザＩＤを表示し、領域４０２に会議参加者の顔画像を表示して、提示情報を提示する対象を特定する。領域４０３に発話比率をトーク比率として、例えば円グラフで表示し、領域４０４に対話中の参加者の感情を表す指標に対応したアバターの表情を表示し、また、領域４０５に表情一致率の経時変化をグラフで表示することで、会議中にコミュニケーションの状態の現状を瞬時に確認できるようにする。また、領域４０６に、発話内容を表すテキストをＴａｌｋＳｔｒｅａｍとして表示し、また、会議中に推奨される会話テーマや単語を表示する。領域４０５に表示する表情一致率は、表情関係性情報について、過去から現在までの経時的な情報に基づき生成された提示情報の例である。領域４０４に表示するアバターの表情は、表情関係性情報について、リアルタイムの情報に基づき生成された提示情報の例である。 As shown in FIG. 4, in the screen 400, the user IDs of the conference participants are displayed in the area 401, the face images of the conference participants are displayed in the area 402, and the target to present the presentation information is specified. The utterance ratio is displayed as a talk ratio in the area 403, for example, a pie chart is displayed, the facial expression of the avatar corresponding to the index indicating the emotion of the participant in the dialogue is displayed in the area 404, and the facial expression matching rate over time is displayed in the area 405. By displaying changes in a graph, it is possible to instantly check the current state of communication during a meeting. Further, in the area 406, the text representing the utterance content is displayed as Talk Stream, and the conversation theme and words recommended during the conference are displayed. The facial expression matching rate displayed in the area 405 is an example of presentation information generated based on temporal information from the past to the present regarding facial expression relationship information. The facial expression of the avatar displayed in the area 404 is an example of presentation information generated based on real-time information regarding facial expression relationship information.

また、図５に示すように、画面５００において、画面４００と同様に、領域５０１にユーザＩＤを表示し、領域５０２に顔画像を表示し、領域５０３にトーク比率を表示し、領域５０４にアバター表情を表示し、領域５０５に発話内容及び推奨テーマ等を表示すると共に、領域５０５に、表情一致率ではなく視線合致率を表示してもよい。 Further, as shown in FIG. 5, in the screen 500, similarly to the screen 400, the user ID is displayed in the region 501, the face image is displayed in the region 502, the talk ratio is displayed in the region 503, and the avatar is displayed in the region 504. The facial expression may be displayed, the utterance content, the recommended theme, etc. may be displayed in the area 505, and the line-of-sight matching rate may be displayed in the area 505 instead of the facial expression matching rate.

＜付記事項１＞
端末装置の制御部における処理の一部又は全部を、情報処理装置１０の備える演算部１２において行う構成としてもよい。例えば、演算部１２が、通信部１１を介して、カメラ２１の撮像画像を取得し、表情情報取得部１３において、第１の参加者の表情に関する第１の表情情報、及び、第２の参加者の表情に関する第２の表情情報を生成する構成としてもよい。 <Appendix 1>
A part or all of the processing in the control unit of the terminal device may be configured to be performed in the calculation unit 12 included in the information processing device 10. For example, the calculation unit 12 acquires the captured image of the camera 21 via the communication unit 11, and the facial expression information acquisition unit 13 acquires the first facial expression information regarding the facial expression of the first participant and the second participation. The second facial expression information regarding the facial expression of the person may be generated.

＜付記事項２＞
また、上記の例では、第１の参加者及び第２の参加者の２名による会議を例にしたが、本実施形態はこれに限定されるものではない。当然、Ｎ名（Ｎは３以上）による会議に対しても本明細書に記載の発明を適用することができる。その場合、Ｎ名中の任意の２人のペアに対して、本明細書に記載の構成を個別に適用することができる。例えば、３名（Ａ、Ｂ、Ｃ）による会議に対しては、（Ａ、Ｂ）（Ａ、Ｃ）（Ｂ、Ｃ）の３組に対して本明細書に記載の発明を個別に適用することができる。
このように、本実施形態に記載の発明は、Ｎ人の状態を表すデータ，Ｎ人の状態の履歴データおよびＮ人の環境情報を用いて、当該Ｎ人の内の一部又は全部の参加者間の関係を示す「関係性情報」を生成するものであると表現することもできる。
＜付記事項３＞
また、上記の例では、第１の参加者及び第２の参加者が共に人間である場合を例に挙げたが、これは本実施形態を限定するものではない。
例えば、第２の参加者は、人間ではなく、予め設定されたアバターやＢＯＴのようにコンピュータによって表現される疑似的な人間であってもよい。このような構成の場合、第２の端末装置は、必須ではなく、表情情報取得部１３及び音声情報取得部１４は、予め作成された当該ＢＯＴが表す表情及び発話内容を、第２の参加者の表情情報及び音声情報として取得する構成とすればよい。
なお、ＢＯＴが表す表情及び発話内容は、会議前に事前に作成されたデータを用いてもよいし、会議中の第１の参加者の表情や発話に応じて適応的に変更される構成としてもよい。 <Appendix 2>
Further, in the above example, the conference by two persons, the first participant and the second participant, has been described as an example, but the present embodiment is not limited to this. Of course, the invention described in the present specification can be applied to a conference with N names (N is 3 or more). In that case, the configurations described in the present specification can be individually applied to any two pairs of N persons. For example, for a conference of three persons (A, B, C), the invention described in this specification is applied individually to three sets of (A, B) (A, C) (B, C). can do.
As described above, the invention described in the present embodiment uses the data representing the state of N persons, the history data of the state of N persons, and the environmental information of N persons to participate in a part or all of the N persons. It can also be said to generate "relationship information" indicating a relationship between persons.
<Appendix 3>
Further, in the above example, the case where both the first participant and the second participant are humans has been described as an example, but this does not limit the present embodiment.
For example, the second participant may not be a human but a pseudo human represented by a computer such as a preset avatar or BOT. In the case of such a configuration, the second terminal device is not essential, and the facial expression information acquisition unit 13 and the voice information acquisition unit 14 provide the facial expression and utterance content represented by the BOT created in advance to the second participant. The facial expression information and the voice information may be acquired.
The facial expression and utterance content represented by the BOT may be data created in advance before the conference, or may be adaptively changed according to the facial expression and utterance of the first participant in the conference. Good.

〔ソフトウェアによる実現例〕
情報処理装置１の制御ブロック（特に演算部１２）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。 [Example of software implementation]
The control block (particularly the arithmetic unit 12) of the information processing device 1 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software.

後者の場合、情報処理装置１は、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータを備えている。このコンピュータは、例えば１つ以上のプロセッサを備えていると共に、上記プログラムを記憶したコンピュータ読み取り可能な記録媒体を備えている。そして、上記コンピュータにおいて、上記プロセッサが上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記プロセッサとしては、例えばＣＰＵ（Central Processing Unit）を用いることができる。上記記録媒体としては、「一時的でない有形の媒体」、例えば、ＲＯＭ（Read Only Memory）等の他、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムを展開するＲＡＭ（Random Access Memory）などをさらに備えていてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the information processing apparatus 1 includes a computer that executes the instructions of a program that is software that realizes each function. The computer includes, for example, one or more processors and a computer-readable recording medium that stores the program. Then, in the computer, the processor reads the program from the recording medium and executes the program to achieve the object of the present invention. As the processor, for example, a CPU (Central Processing Unit) can be used. As the recording medium, a "non-transitory tangible medium" such as a ROM (Read Only Memory), a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. Further, a RAM (Random Access Memory) for expanding the program may be further provided. The program may be supplied to the computer via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the program. Note that one aspect of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, but various modifications can be made within the scope of the claims, and embodiments obtained by appropriately combining the technical means disclosed in the different embodiments Is also included in the technical scope of the present invention.

１００情報処理システム
１０情報処理装置
２０第１の端末装置
３０第２の端末装置
１３表情情報取得部
１４音声情報取得部
１５表情関係性情報生成部
１６発話関係性情報生成部
１７関係性情報生成部 100 Information Processing System 10 Information Processing Device 20 First Terminal Device 30 Second Terminal Device 13 Facial Expression Information Acquisition Unit 14 Voice Information Acquisition Unit 15 Facial Expression Relation Information Generation Unit 16 Utterance Relation Information Generation Unit 17 Relation Information Generation Unit

Claims

A facial expression information acquisition unit that acquires first facial expression information regarding a facial expression of a first participant among the plurality of participants and second facial expression information regarding a facial expression of a second participant among the plurality of participants. ,
A voice information acquisition unit that acquires first utterance information about the utterance of the first participant and second utterance information about the utterance of a second participant of the plurality of participants.
A facial expression relationship that refers to the first facial expression information and the second facial expression information to generate facial expression relationship information that indicates a facial expression relationship between the first participant and the second participant. An information generator,
Utterance relationship for generating utterance relationship information indicating a utterance relationship between the first participant and the second participant with reference to the first utterance information and the second utterance information. An information generator,
A relationship information generation unit that generates relationship information that is information indicating a relationship between the first participant and the second participant with reference to the facial expression relationship information and the utterance relationship information,
An information processing apparatus comprising:

The information processing apparatus according to claim 1, wherein the relationship information is real-time or temporal information indicating a relationship between the first participant and the second participant.

The first facial expression information includes a plurality of first indexes expressing the facial expression of the first participant,
The second facial expression information includes a plurality of second indexes expressing the facial expression of the second participant,
The facial expression relationship information generating unit may generate facial expression difference information regarding a difference between the first index and the second index, and include the generated facial expression difference information in the facial expression relationship information. The information processing apparatus according to claim 1.

The first facial expression information includes first visual line information regarding a visual line direction of the first participant,
The second facial expression information includes second visual line information regarding a visual line direction of the second participant,
The facial expression relationship information generating unit generates visual line relationship information by referring to the first visual line information and the second visual line information, and includes the generated visual line relationship information in the facial expression relationship information. The information processing apparatus according to any one of claims 1 to 3, characterized in that.

The utterance relationship information generation unit,
Generating utterance time relationship information indicating a relationship between the utterance time of the first participant indicated by the first utterance information and the utterance time of the second participant indicated by the second utterance information, The information processing apparatus according to claim 1, wherein the generated utterance time relationship information is included in the utterance relationship information.

The utterance relationship information generation unit,
It is determined whether or not at least one of the first utterance information and the second utterance information includes utterance content included in a specific category, and information corresponding to the determined result is used as the utterance relationship. The information processing device according to claim 1, wherein the information processing device is included in information.

The utterance relationship information generation unit,
A feature that a word having a relatively high appearance frequency within a predetermined time is extracted from at least one of the first utterance information and the second utterance information, and the extracted word is included in the utterance relationship information. The information processing apparatus according to any one of claims 1 to 6.

The relation information generating unit refers to the relation information to generate presentation information to be presented to at least one of the first participant and the second participant. The information processing apparatus according to any one of 1 to 7.

The presentation information includes
Information indicating the ratio of the utterance time of the first participant to the utterance time of the second participant, and
The information processing apparatus according to claim 8, wherein the information processing apparatus includes information about a change over time in the matching rate between the line-of-sight direction of the first participant and the line-of-sight direction of the second participant.

The facial expression relationship information generation unit and the utterance relationship information generation unit further refer to the participant information indicating the attributes of the first and second participants, and the facial expression relationship information and the utterance relationship information. The information processing apparatus according to claim 1, wherein the information processing apparatus generates information.

A facial expression information acquisition step of acquiring first facial expression information regarding a facial expression of a first participant among the plurality of participants and second facial expression information regarding a facial expression of a second participant among the plurality of participants; ,
A voice information acquisition step of acquiring first utterance information related to the utterance of the first participant and second utterance information related to the utterance of a second participant of the plurality of participants;
A facial expression relationship that refers to the first facial expression information and the second facial expression information to generate facial expression relationship information that indicates a facial expression relationship between the first participant and the second participant. An information generation step,
Utterance relationship for generating utterance relationship information indicating a utterance relationship between the first participant and the second participant with reference to the first utterance information and the second utterance information. An information generation step,
A relationship that refers to the facial expression relationship information and the utterance relationship information to generate relationship information that is real-time or temporal information indicating a relationship between the first participant and the second participant. An information generation step,
An information processing method comprising:

An information processing program for causing a computer to function as the information processing apparatus according to claim 1, wherein the facial expression information acquisition unit, the voice information acquisition unit, the facial expression relationship information generation unit, An utterance relationship information generating unit, and an information processing program for causing a computer to function as the relationship information generating unit.