JP4553667B2

JP4553667B2 - Utterance identification method and password verification device using the same

Info

Publication number: JP4553667B2
Application number: JP2004264506A
Authority: JP
Inventors: 真佐彦丸山; 正善坂井; 義大松岡
Original assignee: Nippon Signal Co Ltd
Current assignee: Nippon Signal Co Ltd
Priority date: 2004-09-10
Filing date: 2004-09-10
Publication date: 2010-09-29
Anticipated expiration: 2024-09-10
Also published as: JP2006079456A

Description

本発明は、発話者の唇の動きから発話内容を識別する発話識別方法に関し、特に、発話内容の識別誤りを低減する発話識別方法に関する。また、本発明の発話識別方法を用いて発話状態から認識したパスワードと予め登録されたパスワードとを照合するパスワード照合装置に関する。 The present invention relates to an utterance identification method for identifying utterance contents from the movement of a lip of a speaker, and more particularly to an utterance identification method for reducing identification errors of utterance contents. The present invention also relates to a password verification device that verifies a password recognized from an utterance state using a speech identification method of the present invention and a password registered in advance.

唇の動きを利用して発話者の日本語発話内容を識別する従来方法として、次のような識別方法が提案されている（例えば、非特許文献１参照）。
従来方法では、図５に示すように、上唇の基準点をＡ１、下唇の基準点をＡ２、唇の左右の基準点をＢ１とＢ２、下顎の基準点をＡ３として唇の横幅Ｗ（Ｂ１〜Ｂ２）、縦幅Ｈ（Ａ１〜Ａ２）及び上唇から下顎までの距離ＨＸ（Ａ１〜Ａ３）を測定し、発話したときの横幅Ｗ、縦幅Ｈ及び距離ＨＸの変化状態を算出することにより、発話内容を識別する。例えば、「あ」「い」と続けて発話する場合、「あ」と発話する場合の３つの変数を（Ｗ１，Ｈ１，ＨＸ１）、「い」と発話する場合の３つの変数を（Ｗ２，Ｈ２，ＨＸ２）とすると、２連続母音の発話パターンは、６つの変数（Ｗ１，Ｈ１，ＨＸ１）と（Ｗ２，Ｈ２，ＨＸ２）に関連付けされ、その変化状態を算出して「あ」「い」の連続発話パターンを識別する。
渡辺，「読唇による２連母音識別」，日本機械学会論文集Ｃ編，５５巻，５０９号，１９８９年１月 The following identification method has been proposed as a conventional method for identifying a speaker's Japanese utterance content using lip movement (see, for example, Non-Patent Document 1).
In the conventional method, as shown in FIG. 5, the upper lip reference point is A1, the lower lip reference point is A2, the left and right lip reference points are B1 and B2, and the lower jaw reference point is A3. ~ B2), by measuring the vertical width H (A1 to A2) and the distance HX (A1 to A3) from the upper lip to the lower jaw, and calculating the change state of the horizontal width W, the vertical width H and the distance HX when speaking. Identify the utterance content. For example, when uttering “A” and “I” continuously, three variables when uttering “A” are (W1, H1, HX1), and three variables when uttering “I” are (W2, H2, HX2), the utterance pattern of two consecutive vowels is associated with six variables (W1, H1, HX1) and (W2, H2, HX2). Identify the continuous utterance pattern.
Watanabe, “Dual vowel recognition by lip reading”, Transactions of the Japan Society of Mechanical Engineers, Volume C, 55, 509, January 1989

ところで、上述した母音識別方法も含めて従来の発話識別方法は、「あ」、「い」や「１」、「２」等の各発話内容を全て異なるものとして取扱い、発話者の発話を識別する方法を取っている。
しかし、各発話内容を全て異なるものとして取扱って発話内容を識別する従来方法では、例えば５種類の母音の発話の仕方が安定して異なる発話者では発話内容の識別誤り率は低いが、５種類の母音の内のいくつかの発話の仕方が区別し難い発話者では発話内容の識別誤り率は高くなる。このように、従来方法では、識別の誤り率は各発話者の個人差に大きく依存するという問題がある。 By the way, the conventional utterance identification methods including the above-mentioned vowel identification method treat each utterance content such as “A”, “I”, “1”, “2” as different ones, and identify the utterance of the speaker. Are taking the way.
However, in the conventional method for identifying each utterance content by treating each utterance content as different, for example, the utterance content has a low identification error rate in a utterer with a stable manner of utterance of five types of vowels. The utterance content identification error rate is high for a speaker whose utterances are difficult to distinguish. As described above, the conventional method has a problem that the error rate of identification largely depends on the individual difference of each speaker.

本発明は上記問題点に着目してなされたもので、各発話者の発話の仕方に影響されることなく識別誤り率を低減できる発話識別方法を提供することを目的とする。また、この発話識別方法を用いたパスワード照合装置を提供することを目的とする。 The present invention has been made paying attention to the above problems, and an object of the present invention is to provide an utterance identification method capable of reducing the identification error rate without being influenced by how each utterer utters. Another object of the present invention is to provide a password verification apparatus using this utterance identification method.

このため、請求項１の発明の発話識別方法は、発話者の各発話内容に対する少なくとも唇の縦幅と横幅の変化パターンに基づいて、異なる発話内容で前記変化パターンの類似性が高いものは同一グループとし、異なる発話内容で前記変化パターンの類似性が低いものは別グループとして各発話内容をグループに分類し、前記発話者個人の各発話内容を前記グループ分類に基づいて識別する発話識別方法であって、複数個の発話内容をそれぞれ一定回数づつ発話させ、前記複数個の各発話内容に関してそれぞれ一定回数の前記変化パターンを測定し、各発話内容毎に、前記測定データの内から任意の個数のデータを抽出してグループ分類作成用データとし、残りの測定データを評価用データとし、前記抽出した各グループ分類作成用データに基づいて算出した各発話内容の各統計的分布状態に基づいて各発話内容に関して他の発話内容との重なり率を算出し、該算出した重なり率と予め設定した閾値を比較して前記類似性を判定して各発話内容をグループ分けして前記グループ分類を作成し、作成したグループ分類に基づいて前記評価用データの識別を行い識別誤り率を算出し、算出した識別誤り率が予め設定した許容値以下の時に前記作成したグループ分類はＯＫとしてグループ分類を確定するようにしたことを特徴とする。 For this reason, the utterance identification method according to the first aspect of the present invention is based on at least the lip vertical width and horizontal width change patterns for each utterance content of the speaker, and the same utterance content with high similarity in the change patterns is different. As a group, an utterance identification method that classifies each utterance content as a separate group with different utterance content and low similarity of the change pattern, and identifies each utterance content of the individual speaker based on the group classification A plurality of utterance contents are uttered at a certain number of times, and the change pattern is measured at a certain number of times for each of the plurality of utterance contents, and an arbitrary number of the measurement data is measured for each utterance content. Data is extracted as group classification creation data, and the remaining measurement data is used as evaluation data. Based on the extracted group classification creation data, The overlap rate of each utterance content with the other utterance content is calculated based on each statistical distribution state of each utterance content calculated in the above, and the similarity is determined by comparing the calculated overlap rate with a preset threshold value. The utterance contents are grouped to create the group classification, the evaluation data is identified based on the created group classification, the identification error rate is calculated, and the calculated identification error rate is a preset allowable value. The group classification created as described above is determined as OK in the following cases.

前記発話者の各発話内容は、請求項２のように、前記変化パターンにおける唇の縦幅と横幅の時間的な変化の少ない部分を用いて検出するようにするとよい。
請求項３のように、請求項２の発話内容は、数字である。 Each utterance of the speaker, as in claim 2, it is preferable to be detected with a small portion of the temporal change in the vertical width and the horizontal width of the lips in the change pattern.
As in claim 3 , the utterance content of claim 2 is a number.

また、請求項４の本発明のパスワード照合装置は、パスワード登録者が発話したパスワードの発話状態を検出する検出部と、該検出部の検出した発話状態から少なくとも唇の縦幅と横幅の変化パターンを測定する変化パターン測定部と、各パスワード登録者毎の各発話内容に関するグループ分類データを予め登録したデータベースと、前記変化パターン測定部の測定した変化パターンと前記データベースの登録データに基づいて、請求項１又は２に記載の発話識別方法を用いて前記パスワード登録者の発話したパスワードを認識する発話識別部と、該発話識別部が認識したパスワードと予め登録されたパスワードを照合し、一致／不一致の判定出力を発生する照合部とを備えて構成した。 According to a fourth aspect of the present invention, there is provided a password verification device according to the present invention, comprising: a detection unit that detects an utterance state of a password uttered by a password registrant; and a change pattern of at least the vertical and horizontal widths of the lips from the utterance state detected by the detection unit Based on the change pattern measurement unit for measuring the password, the database in which the group classification data regarding each utterance content for each password registrant is registered in advance, the change pattern measured by the change pattern measurement unit, and the registration data in the database The utterance identification unit for recognizing the password uttered by the password registrant using the utterance identification method described in Item 1 or 2 , the password recognized by the utterance identification unit is collated with a pre-registered password, and match / mismatch And a collation unit that generates the determination output.

かかる構成では、検出部がパスワード登録者の発話したパスワードの発話状態を検出すると、変化パターン測定部は、検出された発話状態から例えば唇の縦幅と横幅の変化パターンを測定する。発話識別部は、変化パターン測定部の測定した変化パターンとデータベースの登録データに基づいて本発明の発話識別方法を用いてパスワード登録者の発話したパスワードを認識する。照合部は、発話識別部が認識したパスワードと予め登録されたパスワードを照合して一致／不一致の判定をする。 In such a configuration, when the detection unit detects the utterance state of the password uttered by the password registrant, the change pattern measurement unit measures, for example, the change pattern of the vertical and horizontal widths of the lips from the detected utterance state. The utterance identification unit recognizes the password spoken by the password registrant using the utterance identification method of the present invention based on the change pattern measured by the change pattern measurement unit and the registered data in the database. Matching unit is a determination of match / mismatch by matching previously registered password that speech recognition unit has recognized.

請求項５のように、パスワード登録時に、パスワード登録者に関する前記データベースに登録するグループ分類データに基づいて、登録しようとするパスワードの一致確率を算出して当該登録しようとするパスワードの有効性を判定するようにするとよい。この場合、請求項６のように、前記算出した一致確率が所定の値より高い時は、一致確率が前記所定の値以下となるようパスワードの登録内容の変更を指示するようにするとよい。 As in claim 5 , at the time of password registration, based on the group classification data registered in the database related to the password registrant, the matching probability of the password to be registered is calculated and the validity of the password to be registered is determined. It is good to do. In this case, as in claim 6 , when the calculated matching probability is higher than a predetermined value, it is preferable to instruct the change of the registered contents of the password so that the matching probability is equal to or lower than the predetermined value.

以上説明したように本発明の発話識別方法によれば、唇の縦幅と横幅の変化パターンの類似性が高く識別し難い異なる発話内容は同一グループとして識別しないようにしたので、発話の仕方の個人差に影響されることなく発話内容の識別誤りを低減できる。 As described above, according to the utterance identification method of the present invention, different utterance contents that are highly similar to each other and are difficult to identify are not identified as the same group. It is possible to reduce utterance content identification errors without being affected by individual differences.

また、本発明のパスワード照合装置によれば、パスワードの識別誤りを低減できるので、発話状態を検出して登録者の認識を行う認証装置の信頼性を向上できるようになる。 In addition, according to the password verification device of the present invention, it is possible to reduce password identification errors, thereby improving the reliability of the authentication device that detects the utterance state and recognizes the registrant.

以下、本発明に係る発話識別方法の一実施形態について説明する。
本発明の発話識別方法は、発話者個人の各発話内容（五十音や数字等）に対する少なくとも唇の縦幅Ｗと横幅Ｈの変化パターンに基づいて、異なる発話内容で変化パターンの類似性が高いものは同一グループとし、異なる発話内容で変化パターンの類似性が低いものは別グループとするように各発話内容をグループに分類する。そして、発話者の各発話内容をグループ分類で選択して識別するものである。尚、本発明における「発話」は、必ずしも音声を伴わなくともよいものとする。 Hereinafter, an embodiment of an utterance identification method according to the present invention will be described.
The utterance identification method according to the present invention is based on the change pattern of at least the vertical width W and the horizontal width H of the lips with respect to each utterance content of the individual utterance (such as Japanese syllabary and numbers). Each utterance content is classified into a group so that higher ones are in the same group, and different utterance content is low in change pattern similarity. Then, each utterance content of the speaker is selected and identified by group classification. The “utterance” in the present invention does not necessarily have to be accompanied by voice.

例えば、５つの数字「１」、「２」、「３」、「４」、「５」について、ある発話者は「１」、「２」、「３」、「４」、「５」を発話した時の唇の変化パターンが安定して異なり、別の発話者は「１」、「２」を発話した時の唇の変化パターンの類似性が高く識別し難く、他の「３」、「４」、「５」を発話した時の唇の変化パターンの類似性は低く識別し易いとする。この場合、ある発話者については、「１」〜「５」はそれぞれ別々のグループとする。また、別の発話者については「１」と「２」を同一グループとし、「３」〜「５」はそれぞれ別々のグループとする。そして、発話者の発話内容が分類したどのグループに属するかを判断して発話内容を識別する。 For example, for five numbers “1”, “2”, “3”, “4”, “5”, a speaker speaks “1”, “2”, “3”, “4”, “5”. The lip change pattern when the utterance is stable is different, and another speaker has a high similarity in the lip change pattern when uttering “1” and “2”, and is difficult to identify. It is assumed that the similarity of the lip change pattern when “4” and “5” are spoken is low and easy to identify. In this case, for a certain speaker, “1” to “5” are different groups. For different speakers, “1” and “2” are in the same group, and “3” to “5” are in separate groups. Then, the utterance content is identified by determining to which group the utterance content of the speaker belongs.

例えばパスワードを「１２３４」としたとき、「１」、「２」、「３」、「４」、「５」が別グループに分類された発話者の場合は、「１２３４」と発話したときにパスワード一致と判定される。また、「１」、「２」が同一グループで「３」、「４」、「５」が別グループに分類された発話者の場合は、「１」と「２」は同一グループで同じものとして扱い、「１２３４」、「２２３４」、「２１３４」、「１１３４」と発話した場合でも、「１」と「２」は識別せずパスワード一致と判定される。 For example, when the password is “1234”, when “1”, “2”, “3”, “4”, “5” are speakers classified into different groups, “1234” is spoken. It is determined that the password matches. Also, in the case of a speaker in which “1” and “2” are the same group and “3”, “4” and “5” are classified into different groups, “1” and “2” are the same group and the same Even if “1234”, “2234”, “2134”, and “1134” are spoken, “1” and “2” are not identified and the passwords are determined to match.

従って、本発明の発話識別方法によれば、各発話者に応じて発話内容をグループに分類し、識別困難な発話内容は同一グループとして無理に識別しないようにしたので、発話時の唇の変化パターンの個人差に影響されることなく発話内容の識別誤りを低減できる。 Therefore, according to the utterance identification method of the present invention, the utterance contents are classified into groups according to each utterer, and the utterance contents that are difficult to identify are not forcibly identified as the same group. It is possible to reduce utterance content identification errors without being influenced by individual differences in patterns.

次に、本発明の発話識別方法において所定の発話内容識別率を達成するグループの作成方法の一実施形態を、図１のフローチャートを参照して説明する。
ステップ１（図中Ｓ１で示し、以下同様とする）では、発話者にｋ通りの発話内容を各々Ｍ回発話させ、その時の唇の横幅Ｗと縦幅Ｈの変化パターンを測定して発話データとする。
ステップ２では、ステップ１で測定した発話データを登録する。
ステップ３では、後述の発話内容のグループ分類段階で使用する閾値として重なり率Ｐの初期値を設定する。この場合、重なり率Ｐの初期値は大きな値に設定し、類似性の高い異なる発話内容でも別グループに分類されるようにする。 Next, an embodiment of a method for creating a group that achieves a predetermined utterance content identification rate in the utterance identification method of the present invention will be described with reference to the flowchart of FIG.
In step 1 (indicated by S1 in the figure, the same shall apply hereinafter), the utterer is caused to utter the k utterance contents M times, and the change pattern of the width W and the height H of the lips at that time is measured to determine the utterance data. And
In step 2, the speech data measured in step 1 is registered.
In step 3, an initial value of the overlap rate P is set as a threshold value used in the group classification stage of the utterance content described later. In this case, the initial value of the overlap rate P is set to a large value so that different utterance contents with high similarity are classified into different groups.

ステップ４では、ステップ２で登録したｋ通りの各発話内容の各Ｍ個の発話データから無作為にＮ個づつサンプリグしてグループ分類の作成用データとする。ここで、残りのＮ′（Ｍ−Ｎ）×ｋ個の発話データは、作成したグループ分類の評価用データとする。
ステップ５では、ステップ４で抽出したｋ通りの各発話内容の各Ｎ個の発話データから各発話内容の統計的分布（確率密度）を算出し、各発話内容に関して他の発話内容との統計的分布間の重なり率Ｑを算出する。 In step 4, N pieces are randomly sampled from the M pieces of utterance data of the k kinds of utterance contents registered in step 2 to obtain group classification creation data. Here, the remaining N ′ (M−N) × k utterance data is used as evaluation data for the created group classification.
In step 5, a statistical distribution (probability density) of each utterance content is calculated from each N utterance data of each k utterance content extracted in step 4, and each utterance content is statistically compared with other utterance content. The overlap ratio Q between the distributions is calculated.

図２に統計的分布の例を示す。例えば統計的分布間の重なり率Ｑを算出する発話内容をＡ，Ｂとして、図のＦＡ（ｘ）は発話内容Ａの統計的分布とし、ＦＢ（ｘ）は発話内容Ｂの統計的分布とする。互いの分布が重なった部分のＱａｂ部分は発話内容Ａが発話内容Ｂと誤って識別される確率を示し、他のＱｂａ部分は発話内容Ｂが発話内容Ａと誤って識別される確率を示している。このＱａｂとＱｂａを、重なり率Ｑとして算出する。 FIG. 2 shows an example of statistical distribution. For example, the utterance contents for calculating the overlap ratio Q between the statistical distributions are A and B, FA (x) in the figure is the statistical distribution of the utterance contents A, and FB (x) is the statistical distribution of the utterance contents B. . The Qab portions where the distributions overlap each other indicate the probability that the utterance content A is erroneously identified as the utterance content B, and the other Qba portions indicate the probability that the utterance content B is erroneously identified as the utterance content A. Yes. Qab and Qba are calculated as the overlapping rate Q.

ステップ６では、ステップ５で算出した重なり率Ｑとステップ３で設定した重なり率Ｐを比較し、比較結果に基づいてｋ通りの各発話内容についてグループ分けする。例えば、図２の例で説明すると、Ｑａｂ、ＱｂａとＰとの大小関係で以下のようにグループ分けする。 In step 6, the overlap rate Q calculated in step 5 and the overlap rate P set in step 3 are compared, and k different utterance contents are grouped based on the comparison result. For example, referring to the example of FIG. 2, the groups are grouped as follows according to the magnitude relationship between Qab, Qba and P.

ステップ７では、ステップ６で作成したグループ分類を用いて、（Ｎ′×ｋ）個の評価用データを識別し、識別誤り率を算出する。
ステップ８では、ステップ７で得られた識別誤り率を予め設定した所望の識別誤り率の許容値と比較する。ステップ７で算出した識別誤り率が許容値以下であればステップ６で作成したグループ分類はＯＫとして、ステップ９に進み、ステップ１で測定したｋ通りの発話内容に関するグループ分類を確定する。一方、ステップ７で算出した識別誤り率が許容値より大きい場合は、ステップ１０に進む。 In step 7, using the group classification created in step 6, (N ′ × k) pieces of evaluation data are identified, and an identification error rate is calculated.
In step 8, the discrimination error rate obtained in step 7 is compared with a predetermined tolerance value of a desired discrimination error rate. If the identification error rate calculated in Step 7 is equal to or less than the allowable value, the group classification created in Step 6 is OK, and the process proceeds to Step 9 to determine the group classifications relating to the k utterance contents measured in Step 1. On the other hand, if the identification error rate calculated in step 7 is larger than the allowable value, the process proceeds to step 10.

ステップ１０では、識別誤り率が高かった発話内容について閾値である重なり率Ｐの設定値を小さくし、類似性の低い発話内容でも同一グループとなるよう、分類するグループ数が減少するようにして、ステップ４以下の処理を繰り返し実行する。このようにして、所望以下の識別誤り率となるようなグループ分類を設定する。 In step 10, the setting value of the overlap rate P, which is a threshold value, is reduced for the utterance content having a high identification error rate, and the number of groups to be classified is decreased so that the utterance content with low similarity is also the same group. Step 4 and subsequent processes are repeatedly executed. In this way, a group classification is set so that an undesired identification error rate is obtained.

以上のように本発明の発話識別方法は、各発話者毎に発話内容をグループに分類し、発話者個々に設定したグループ分類に基づいて各発話者の発話内容を識別するので、従来のように各発話内容を全て区別して識別しようとする方法に比較して、発話内容の識別誤り率を低減できる。言い換えれば、発話内容の識別率を高めることができる。 As described above, the utterance identification method of the present invention classifies the utterance contents into groups for each utterer, and identifies the utterance contents of each utterer based on the group classification set for each utterer. Compared with the method of distinguishing and identifying all utterance contents, the identification error rate of the utterance contents can be reduced. In other words, the utterance content identification rate can be increased.

次に、前記ステップ１の発話データを測定する際の、唇の横幅Ｗと縦幅Ｈの測定方法の一例について、数字発話の例で説明する。
ここでは発話内容が「０（ＺＥＲＯ）」の場合で説明する。 Next, an example of a method for measuring the lateral width W and the vertical width H of the lips when measuring the speech data in Step 1 will be described using an example of numerical speech.
Here, a case where the utterance content is “0 (ZERO)” will be described.

図３は、「０（ＺＥＲＯ）」を発話した場合の、唇の縦幅Ｈと横幅Ｗの変化パターンを示す。図中のＨ（ｉ）は縦幅の変化パターンを示し、Ｗ（ｉ）は横幅の変化パターンを示し、Ｄ（ｉ）は、縦幅の変化量と横幅の変化量を合せた変化量を示す。尚、ｉ＝１，２、・・・とする。ここで、前記Ｈ（ｉ）は、発話前の閉唇時の値をＨ（０）とし、発話した時の測定値をＨ′（ｉ）すると、Ｈ（ｉ）＝Ｈ′（ｉ）−Ｈ（０）である。同様に、前記Ｗ（ｉ）は、発話前の閉唇時の値をＷ（０）とし、発話した時の測定値をＷ′（ｉ）すると、Ｗ（ｉ）＝Ｗ′（ｉ）−Ｗ（０）である。変化量Ｄ（ｉ）は、Ｄ（ｉ）＝（（Ｈ′（ｉ）−Ｈ（ｉ−１））²＋（Ｗ′（ｉ）−Ｗ（ｉ−１））²）^1/2である。 FIG. 3 shows a change pattern of the vertical width H and the horizontal width W of the lips when “0 (ZERO)” is spoken. In the figure, H (i) indicates the change pattern of the vertical width, W (i) indicates the change pattern of the horizontal width, and D (i) indicates the change amount of the change amount of the vertical width and the change amount of the horizontal width. Show. Note that i = 1, 2,... Here, H (i) is defined as H (i) = H ′ (i) −, where H (0) is a value at the time of lips before utterance and H ′ (i) is a measured value at the time of utterance. H (0). Similarly, W (i) is W (i) = W ′ (i) −, where W (0) is the value when the lips are closed before speaking and W ′ (i) is the measured value when speaking. W (0). The amount of change D (i) is D (i) = ((H ′ (i) −H (i−1)) ² + (W ′ (i) −W (i−1)) ² ) ^1/2 . is there.

また、発話前の変化量が小さい部分（図中のａ部分）は閉唇状態を示し、次の変化量が小さい部分（図中のｂ部分）は「０（ＺＥＲＯ）」を発話した時の「ＺＥ」の「Ｅ」の発話状態を示し、次の変化量が小さい部分（図中のｃ部分）は発話「ＲＯ」の「Ｏ」の発話状態を示す。発話後の変化量が小さい部分（図中のｄ部分）は閉唇状態を示し、発話の完了を検出するためのものである。 In addition, the portion with a small amount of change before utterance (a portion in the figure) shows a lip closed state, and the next portion with a small amount of change (b portion in the figure) utters “0 (ZERO)”. The utterance state of “E” of “ZE” indicates the utterance state of “O” of the utterance “RO”. A portion with a small amount of change after utterance (“d” portion in the figure) indicates a lip closed state and is for detecting completion of the utterance.

このように、例えば母音「Ａ」、［Ｉ］、「Ｕ」、［Ｅ］、［Ｏ］と閉唇状態は、変化量Ｄ（ｉ）が少なく発話時の変化パターンが安定している。従って、発話内容「ＺＥＲＯ」の「ＺＥ」と「ＲＯ」の各発話データを測定する場合、変化量の少ないｂ部分とｃ部分を測定して、「ＺＥ」と「ＲＯ」を検出して発話データとする。 As described above, for example, the vowels “A”, [I], “U”, [E], [O] and the lip state have a small change amount D (i) and a stable change pattern at the time of utterance. Therefore, when measuring each utterance data of “ZE” and “RO” of the utterance content “ZERO”, the “b” and “c” portions with a small amount of change are measured, and “ZE” and “RO” are detected and the utterance is detected. Data.

本実施形態のように、各発話内容を発話した時の唇の縦幅Ｈと横幅Ｗの測定データとして、例えば母音のような変化量の少ない部分を測定すれば、同じ発話内容に関する発話データのばらつきを低減でき、同一発話者の発話毎のばらつきによる識別誤りを低減できる。また、本実施形態では、唇の縦幅と横幅の変化パターンから発話内容を識別するので、唇周辺だけを含む画像でよく、従来のような下顎の位置も用いる識別方法に比べて識別に必要な画像範囲を狭くできる。このため、従来方法より必要な画像データ量を少なくできる利点がある。 As in this embodiment, when measuring a portion with a small amount of change such as a vowel as the measurement data of the vertical width H and the horizontal width W of the lips when each utterance content is uttered, the utterance data on the same utterance content Variations can be reduced, and identification errors due to variations for each utterance of the same speaker can be reduced. Also, in this embodiment, since the utterance content is identified from the change pattern of the vertical and horizontal widths of the lips, an image including only the periphery of the lips may be used, and it is necessary for identification as compared with the conventional identification method using the position of the lower jaw. The image range can be narrowed. Therefore, there is an advantage that a necessary amount of image data can be reduced as compared with the conventional method.

次に、本発明の発話識別方法を用いた本発明のパスワード照合装置について説明する。
図４は、本発明のパスワード照合装置の一実施形態を示す構成図である。
図４において、本実施形態のパスワード照合装置は、発話者の個人データを入力する入力部１と、発話者を撮像する撮像手段としてのカメラ２と、カメラ２の撮像した画像を処理する画像処理部３と、画像処理部３の画像処理データに基づいて発話内容を識別して入力パスワードを認識する発話識別部４と、予め登録された各発話者の発話登録データを収納するデータベース５と、予め登録された各発話者のパスワードを記憶するメモリ６と、発話識別部４で識別した入力パスワードとメモリ６に記憶された登録パスワードを照合する照合部７とを備えて構成される。 Next, the password verification device of the present invention using the speech identification method of the present invention will be described.
FIG. 4 is a block diagram showing an embodiment of the password verification device of the present invention.
4, the password verification apparatus according to the present embodiment includes an input unit 1 that inputs personal data of a speaker, a camera 2 as an imaging unit that images the speaker, and image processing that processes an image captured by the camera 2. Unit 3, utterance identification unit 4 that recognizes the utterance content based on the image processing data of image processing unit 3 and recognizes the input password, database 5 that stores utterance registration data of each speaker registered in advance, It comprises a memory 6 for storing the password of each speaker registered in advance, and a collation unit 7 for collating the input password identified by the utterance identification unit 4 and the registered password stored in the memory 6.

前記入力部１は、予め登録した発話者の個人情報を認証時に入力するためのものである。
前記カメラ２は、認証を受けようとする発話者の顔画像を撮像して発話状態を検出するものであり、検出部に相当する。
前記画像処理部３は、カメラ２の撮像画像から唇周辺の画像を抽出し、唇の縦幅Ｈと横幅Ｗの変化パターンを測定する。従って、変化パターン測定部に相当する。
前記発話識別部４は、入力部１で入力された個人データに該当する登録データをデータベース５から読み出す。また、読み出した登録データと画像処理部３から入力された変化パターンデータとに基づいて、入力された発話内容を前述の発話識別方法を用いて識別して入力パスワードを認識する。 The input unit 1 is for inputting personal information of a pre-registered speaker at the time of authentication.
The camera 2 detects a speech state by capturing a face image of a speaker who is going to be authenticated, and corresponds to a detection unit.
The image processing unit 3 extracts an image around the lips from the image captured by the camera 2 and measures a change pattern of the vertical width H and the horizontal width W of the lips. Therefore, it corresponds to a change pattern measurement unit.
The utterance identification unit 4 reads registration data corresponding to the personal data input by the input unit 1 from the database 5. Further, based on the read registration data and the change pattern data input from the image processing unit 3, the input utterance content is identified using the above-described utterance identification method to recognize the input password.

前記データベース５は、登録時に前述のグループ分類設定方法に基づいて分類された各発話者毎のグループ分類データを登録データとして各発話者の個人データと対応付けて記憶する。
前記メモリ６は、予め登録された各発話者の登録パスワードを発話者の個人データと対応付けて記憶するものである。
前記照合部７は、入力部１で入力された個人データに該当する登録パスワードをメモリ６から読み出し、発話識別部４で認識された入力パスワードと照合し、一致／不一致の判定出力を発生する。 The database 5 stores group classification data for each speaker classified based on the group classification setting method at the time of registration in association with personal data of each speaker as registration data.
The memory 6 stores a registered password of each speaker registered in advance in association with the personal data of the speaker.
The collation unit 7 reads a registered password corresponding to the personal data input by the input unit 1 from the memory 6, collates it with the input password recognized by the utterance identification unit 4, and generates a match / mismatch determination output.

次に、本実施形態のパスワード照合装置の動作を説明する。
認証を受けようとする発話者は、入力部１で予め登録してある個人データを入力し、カメラ２の前で自身の登録パスワードを発話する。カメラ２は、発話者を撮像し、撮像画像を画像処理部３に送信する。画像処理部３は、入力された画像から唇周辺の画像を抽出し、唇の動きから変化パターンを測定し、測定データを発話識別部４に送信する。発話識別部４は、入力部１から入力された個人データに基づいて認証を受けようとする発話者の発話識別用のグループ分類データをデータベース５から予め読み出しておく。画像処理部３から測定データが入力すると、読み出したグループ分類データに基づいて入力された発話内容を識別し、入力パスワードを認識し、当該認識した入力パスワードを照合部７に送信する。照合部７は、入力部１から入力された個人データに基づいて認証を受けようとする発話者の登録パスワードをメモリ６から予め読み出しておく。発話識別部４から入力パスワードが入力すると、読み出した登録パスワードと照合し、一致していれば認証ＯＫの判定出力を発生し、不一致であれば認証拒否の判定出力を発生する。 Next, the operation of the password verification device of this embodiment will be described.
A speaker who wants to receive authentication inputs personal data registered in advance using the input unit 1 and speaks his / her registered password in front of the camera 2. The camera 2 captures the speaker and transmits the captured image to the image processing unit 3. The image processing unit 3 extracts an image around the lips from the input image, measures a change pattern from the movement of the lips, and transmits the measurement data to the utterance identification unit 4. The utterance identification unit 4 reads in advance from the database 5 group classification data for utterance identification of a speaker who is to be authenticated based on personal data input from the input unit 1. When the measurement data is input from the image processing unit 3, the input utterance content is identified based on the read group classification data, the input password is recognized, and the recognized input password is transmitted to the verification unit 7. The verification unit 7 reads in advance from the memory 6 a registered password of the speaker who is to be authenticated based on the personal data input from the input unit 1. When the input password is input from the utterance identification unit 4, it is checked against the read registered password, and if it matches, an authentication OK determination output is generated, and if they do not match, an authentication rejection determination output is generated.

例えば、登録パスワードとして「１２３４」を予め登録してある発話者の発話内容の変化パターンが、「１」と「２」が同一グループであり、「３」、「４」は互いに別グループ且つ「１」と「２」のグループとも別グループに分類されているものとする。この場合、発話者が「１２３４」と発話したときに、発話識別部４が「１２３４」の他に「１１３４」、「２１３４」、「２２３４」と仮に識別したとしても、「１」と「２」が同一グループであるので照合部７に送信する入力パスワードとしては「（１又は２）（１又は２）３４」として照合部７に入力し、一致と判定される。これにより、照合部７における一致判定の確率を高くできる。 For example, in the change pattern of the utterance content of a speaker who has previously registered “1234” as the registration password, “1” and “2” are the same group, “3” and “4” are different groups and “ It is assumed that the groups “1” and “2” are classified into different groups. In this case, when the speaker speaks “1234”, even if the speech identifying unit 4 identifies “1134”, “2134”, and “2234” in addition to “1234”, “1” and “2” "Is in the same group, the input password to be transmitted to the collation unit 7 is input to the collation unit 7 as" (1 or 2) (1 or 2) 34 ", and it is determined as a match. Thereby, the probability of matching determination in the matching unit 7 can be increased.

ところで、本発明のように、各発話内容における唇の動きの変化パターンの類似性の大小でグループ分類して発話内容を識別する方法の場合、各発話者の発話の仕方により、例えばパスワードとしての有効な情報量が異なる。同一グループに分類される発話内容の数が多い発話の仕方をする人は、パスワードとして有効な情報量が低減する。 By the way, in the case of the method of identifying the utterance contents by grouping according to the similarity of the change pattern of the lip movement in each utterance content as in the present invention, depending on the manner of utterance of each utterer, for example, as a password Effective amount of information is different. A person who speaks with a large number of utterance contents classified into the same group reduces the amount of information effective as a password.

具体的に、例えば０〜９の数字で説明すると、｛０｝、｛１，２｝、｛３｝、｛４｝、｛５｝、｛６｝、｛７｝、｛８｝、｛９｝の９種類の数字発話が安定して異なる発話の仕方をする人と、｛０，３，４，６｝、｛１，２｝、｛５，９｝、｛７｝、｛８｝の５種類の数字発話が異なる発話の仕方をする人の場合を例とし、００００年０１月０１日〜９９９９年１２月３１日までの１００００年間の年月日（３．６５×１０⁶通り）をパスワードとして選択するものとし、例えば１９８９年１０月２５日の数字８桁の並び（１９８９１０２５）をパスワードに選択したとする。前者の場合は、（１９８９１０２５）、（１９８９１０１５）（２９８９１０１５）（２９８９１０２５）の４通りが同一として認識され、一致する確率は４／（３．６５×１０⁶）で約１０^-6である。一方、後者の場合は、２（年の千の位）×２（年の百の位）×１（年の十の位）×２（年の一の位）×１（月の十の位）×４（月の一の位）×２（日の十の位）×２（日の一の位）＝１２８通りが同一として認識され、一致する確率は１２８／（３．６５×１０⁶）で約３．５×１０^-5となる。即ち、前者の場合は、パスワードとしての有効な情報が９種類であるが、前者に比べてグループ分類数が少ない後者の場合は、パスワードとしての有効な情報は５種類に低減し、任意に入力された８桁の数字とパスワードが一致する確率が高くなる。 Specifically, for example, when described with numbers 0 to 9, {0}, {1,2}, {3}, {4}, {5}, {6}, {7}, {8}, {9 } And 9 types of numerical utterances with different stable utterances, {0, 3, 4, 6}, {1, 2}, {5, 9}, {7}, {8} For example, in the case of a person who utters five different types of numbers, the date of 10000 years from January 01, 0000 to December 31, 9999 (3.65 × 10 ⁶ ways) Assume that a password is selected, for example, an 8-digit sequence of October 25, 1989 (19891025) is selected as the password. In the former case, (19891025), (19891015) (29891015) (29891025) are recognized as the same, and the probability of matching is 4 / (3.65 × 10 ⁶ ), which is about 10 ⁻⁶ . On the other hand, in the latter case, 2 (thousands of the year) x 2 (hundreds of the year) x 1 (tenths of the year) x 2 (first of the year) x 1 (tenths of the month) ) × 4 (1st digit of the month) × 2 (10th digit of the day) × 2 (1st digit of the day) = 128 patterns are recognized as the same, and the probability of matching is 128 / (3.65 × 10 ⁶ ) ^Is about 3.5 × 10 ⁻⁵ . In other words, in the former case, there are nine types of valid information as passwords, but in the latter case, where the number of group classifications is smaller than the former, valid information as passwords is reduced to five types and can be entered arbitrarily. The probability that the entered 8-digit number matches the password is increased.

従って、上述した本発明の発話識別方法を採用する図４に示すパスワード照合装置の場合、パスワード登録者の発話の仕方を測定して得られるグループ分類データに基づいて、パスワード登録時に登録しようとするパスワードの一致確率を算出してその有効性を判別することが望ましい。また、一致確率を算出した結果、一致確率が所望する所定の一致確率より高い場合には、登録パスワードが所定以下の一致確率となるよう登録しようとするパスワードの登録内容の変更を指示するようにすることが望ましい。 Therefore, in the case of the password verification apparatus shown in FIG. 4 that employs the above-described utterance identification method of the present invention, an attempt is made to register at the time of password registration based on the group classification data obtained by measuring how the password registrant speaks. It is desirable to calculate the password matching probability and determine its validity. In addition, if the match probability is higher than the desired match probability as a result of calculating the match probability, it is instructed to change the registered content of the password to be registered so that the registered password has a match probability equal to or lower than the predetermined value. It is desirable to do.

例えば、０〜９の数字に関して前述したようにグループ分類数が９種類の人に比べて、グループ分類数が５種類の人は、１００００年間の内の任意の年月日をパスワードとして登録する場合、任意に入力された８桁の数字とパスワードの一致確率が高くなってしまいパスワードの有効性が低い。このような場合、グループ分類数が５種類の人に対しては、パスワードの登録時に、パスワードの一致確率を低くするために、例えば、年月日に加えて社員番号、電話番号等の登録者の覚え易い数字を追加したパスワードを登録するように案内指示するようにする。こうすることにより、本発明のパスワード照合装置の信頼性を高めることができる。 For example, as described above with respect to the numbers 0 to 9, when the number of group classifications is 5 people compared to the number of group classifications 9 types, any date within 10,000 years is registered as a password The probability of matching the arbitrarily entered 8-digit number with the password increases, and the effectiveness of the password is low. In such a case, for a person with 5 types of group classification, in order to reduce the password matching probability when registering a password, for example, a registrant such as an employee number or telephone number in addition to the date Guidance is given to register a password with a number that is easy to remember. By doing so, the reliability of the password verification device of the present invention can be improved.

尚、上記実施形態では、発話時の唇の縦幅と横幅の変化パターンを用いて発話内容のグループ分類を行うようにしたが、唇の縦幅と横幅に加えて上唇から下顎までの距離も用いるようにしてもよいことは言うまでもない。 In the above embodiment, the lip content group classification is performed using the change pattern of the vertical and horizontal width of the lips at the time of speaking, but the distance from the upper lip to the lower jaw is also added to the vertical and horizontal widths of the lips. Needless to say, it may be used.

本発明に係る発話識別方法のグループ分類確定方法の一実施形態を説明するフローチャートThe flowchart explaining one Embodiment of the group classification confirmation method of the speech identification method based on this invention 統計的分布の例を示す図Figure showing an example of statistical distribution 数字発話時の唇の縦幅Ｈと横幅Ｗの変化パターンの一例を示す図The figure which shows an example of the change pattern of the vertical width H and the horizontal width W of the lip at the time of the number utterance 本発明に係るパスワード照合装置の一実施形態を示す構成図The block diagram which shows one Embodiment of the password verification apparatus which concerns on this invention 従来の発話識別方法の説明図Illustration of conventional utterance identification method

Explanation of symbols

１入力部
２カメラ
３画像処理部
４発話識別部
５データベース
６メモリ
７照合部 DESCRIPTION OF SYMBOLS 1 Input part 2 Camera 3 Image processing part 4 Speech identification part 5 Database 6 Memory 7 Collation part

Claims

Based on at least the lip height and width change patterns for each utterance content of the speaker, different utterance contents with high similarity of the change patterns are grouped together, and different utterance contents have similarities of the change patterns. low is a speech identification method classified into groups each utterance as a separate group, identifying on the basis of the speech content of the speech individuals in the group classification,
A plurality of utterance contents are uttered at a fixed number of times, the change pattern is measured a fixed number of times for each of the plurality of utterance contents, and an arbitrary number of data from the measurement data is obtained for each utterance content. Extracted into group classification creation data, and the remaining measurement data as evaluation data, with respect to each utterance content based on each statistical distribution state of each utterance content calculated based on each extracted group classification creation data Calculate the overlap rate with other utterance content, compare the calculated overlap rate with a preset threshold value, determine the similarity, group each utterance content, and create the group classification Based on the group classification, the evaluation data is identified to calculate an identification error rate, and the group created when the calculated identification error rate is equal to or less than a preset allowable value Class of speech identification method is characterized in that so as to determine a group classified as OK.

Each utterance of the speaker is speech identification method according to claim 1 which is adapted detected using a temporal portion of a small change in the vertical width and the horizontal width of the lips in the change pattern.

The utterance identification method according to claim 2 , wherein the utterance content is a number.

A detection unit for detecting the utterance state of the password spoken by the password registrant;
A change pattern measuring unit that measures a change pattern of at least the vertical and horizontal widths of the lips from the utterance state detected by the detection unit;
A database in which group classification data related to each utterance content for each password registrant is registered in advance;
Based on the change pattern measured by the change pattern measurement unit and the registration data of the database, the utterance identification unit that recognizes the password spoken by the password registrant using the utterance identification method according to claim 1 or 2 ,
A collation unit that collates the password recognized by the utterance identification unit with a pre-registered password, and generates a match / mismatch determination output;
A password verification device characterized by comprising:

During password registration, based on the group classification data to be registered in the database of passwords registrant, claim 4 which calculates the probability of match password to be registered and to determine the validity of the password to be the registered The password verification device described in 1.

The password verification device according to claim 5 , wherein when the calculated matching probability is higher than a predetermined value, a change in password registration content is instructed so that the matching probability is equal to or lower than the predetermined value.