CN113035197A

CN113035197A - Speech recognition system based on artificial intelligence algorithm

Info

Publication number: CN113035197A
Application number: CN202110268134.1A
Authority: CN
Inventors: 杜金林
Original assignee: Translated By Mdt Infotech Ltd Shanghai
Current assignee: Translated By Mdt Infotech Ltd Shanghai
Priority date: 2021-03-11
Filing date: 2021-03-11
Publication date: 2021-06-25

Abstract

本发明提供一种基于人工智能算法的语音识别系统。所述基于人工智能算法的语音识别系统包括用户界面，所述用户界面用于显示内容；语音接收模块，所述语音接收模块用于接收语音信号；语音识别模块，所述语音识别模块用于将所述语音信号进行识别；对比模块，所述对比模块用于检测解码结果的准确率；摄像模块，所述摄像模块用于提供用户的图像信号；意图判断模块，所述意图判断模块针对所述摄像模块所拍摄的图像信号和所识别出的发声，判断所述用户有无操作所述被控制装置的意图。本发明提供的基于人工智能算法的语音识别系统具有识别准确率高、可对用户操作意识进行判断的优点。The invention provides a speech recognition system based on artificial intelligence algorithm. The artificial intelligence algorithm-based voice recognition system includes a user interface, which is used for displaying content; a voice receiving module, which is used for receiving voice signals; a voice recognition module, which is used for The voice signal is recognized; the comparison module is used to detect the accuracy of the decoding result; the camera module is used to provide the image signal of the user; the intention judgment module is used for the The image signal captured by the camera module and the recognized sound are used to determine whether the user intends to operate the controlled device. The speech recognition system based on the artificial intelligence algorithm provided by the present invention has the advantages of high recognition accuracy and can judge the user's operation awareness.

Description

Speech recognition system based on artificial intelligence algorithm

Technical Field

The invention relates to the technical field of voice recognition, in particular to a voice recognition system based on an artificial intelligence algorithm.

Background

With the progress of data processing technology and the rapid spread of mobile internet, computer technology is widely applied to various fields of society, and with the progress of data processing technology, mass data is generated. Among them, voice data is receiving more and more attention. Speech recognition is a cross discipline. Over the last two decades. Speech recognition technology has made significant progress, starting to move from the laboratory to the market. It is expected that voice recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical care, home services, consumer electronics, etc. within the next 10 years. The application of speech recognition dictation machines in some fields is rated by the U.S. news community as one of ten major computer developments in 1997. Many experts consider the speech recognition technology to be one of ten important branch development technologies in the information technology field from 2000 to 2010. The fields to which speech recognition technology relates include: signal processing, pattern recognition, probability and information theory, sound and hearing mechanisms, artificial intelligence, and the like. Speech recognition is technically complex but more widely used than speech synthesis. The greatest advantage of speech recognition is that it makes the human-machine user interface more natural and easy to use.

With the rapid development of microelectronic technology and communication technology, embedded communication devices such as mobile phones and the like almost become articles essential for people to work and live, and the requirements of people on the functions of the embedded communication devices are higher and higher, so that the application of voice technology to the devices becomes a hotspot of research, and the accuracy rate of the existing voice functions is not high; moreover, the voice recognition is always touched by mistake, and the intention of the user to use the voice recognition cannot be judged.

Therefore, there is a need to provide a new speech recognition system based on artificial intelligence algorithm to solve the above technical problems.

Disclosure of Invention

The invention solves the technical problem of providing the voice recognition system based on the artificial intelligence algorithm, which has high recognition accuracy and can judge the operation consciousness of the user.

In order to solve the above technical problems, the speech recognition system based on artificial intelligence algorithm provided by the invention comprises:

a user interface for displaying content;

the voice receiving module is used for receiving a voice signal;

a speech recognition module for recognizing the speech signal, the speech recognition module comprising:

the device comprises a signal conversion module, a feature extraction module, an encoding module, a codebook module and an operation decoding module;

the signal conversion module is used for converting the voice signal into a digital signal;

the characteristic extraction module is used for performing framing processing on the digital signals, extracting characteristic parameters of each frame of the digital signals and obtaining a characteristic vector sequence;

the coding module is used for converting the proper characteristic sequence into a characteristic code word sequence;

the code book module stores the probability value of the code word in the code book corresponding to each code word;

the decoding operation module is used for carrying out decoding operation on the characteristic code word sequence to obtain an identification result, and directly searching a cipher word with the maximum matching probability from the cipher book module for each code word in the characteristic code word sequence in the operation to obtain a decoding result;

the comparison module is used for detecting the accuracy of the decoding result;

the camera module is used for providing an image signal of a user;

and the intention judging module is used for judging whether the user has the intention of operating the controlled device or not according to the image signal shot by the camera module and the recognized voice.

Preferably, the codebook is a gaussian codebook.

Preferably, the encoding module converts the feature vector sequence into a feature codeword sequence according to the following steps:

s1: dividing the feature vector sequence into a plurality of subspaces, wherein each subspace corresponds to a codebook;

s2: calculating distance measurement between all the feature vectors in each subspace and each code word in a corresponding codebook, and taking the code word with the minimum distance measurement with the feature vector as the code word corresponding to the feature vector in the feature code word sequence;

s3: and combining the code words corresponding to all vectors in each subspace of the characteristic vector sequence according to the original vector sequence to obtain the corresponding characteristic code word sequence.

Preferably, the codebook module is generated by the following steps:

l1: calculating a mean value and a variance vector corresponding to each code word in the Gaussian codebook;

l2: calculating the logarithm probability value of each code word in the characteristic codebook and the logarithm probability value of each code word in the Gaussian codebook by using the mean value and the variance vector;

l3: and storing the probability values of all code words in the characteristic codebook matched with all code words in the Gaussian codebook to obtain the codebook module.

Preferably, the comparison module stores a plurality of commonly used specific sentence texts, and compares the result recognized by the speech recognition module with the specific sentence texts to judge the recognition accuracy of the speech recognition module.

Preferably, the camera module highlights the eye focus and lip movement of the user.

Preferably, the intention determination module determines, when it is determined that there is an operation intention, a degree of reliability indicating a degree of intention of the operation.

Preferably, the control device further includes a control state changing module that changes the control state of the controlled device to a direction that is less noticeable to the user than when the intention determination unit determines that there is no operation intention.

Preferably, the control state changing module changes the control state of the controlled device to a direction that is less noticeable to the user than when the reliability determined by the intention determining module is low.

Preferably, the control state changing module controls the controlled device to notify the user of the recognition failure when the recognition of the voice uttered by the user fails, and changes the notified state to a direction that is not recognized by the user when the reliability of the operation intention with respect to the utterance is low as compared with when the reliability is high.

Compared with the related art, the voice recognition system based on the artificial intelligence algorithm has the following beneficial effects:

the invention provides a voice recognition system based on artificial intelligence algorithm, which adds the steps of dynamically merging and splitting subsets according to the vector quantity in the subsets and the total distance measurement of the vectors in the subsets in the process of clustering the voice feature vector set to obtain a codebook, reduces the distance measurement sum of the vectors in the clustered set and the corresponding code words, improves the precision of the clustering algorithm, ensures the recognition performance of the voice system, and greatly reduces the storage capacity of the system; in addition, when it is determined that the user has no operation intention, the control state of the controlled device is changed to a direction that is not recognized by the user, as compared with the case where it is determined that the user has an operation intention, thereby increasing the comfort of use for the user.

Detailed Description

The present invention will be further described with reference to the following embodiments.

An artificial intelligence algorithm based speech recognition system comprising:

a user interface for displaying content;

the voice receiving module is used for receiving a voice signal;

the camera module is used for providing an image signal of a user;

The codebook is a gaussian codebook.

The encoding module converts the feature vector sequence into a feature codeword sequence according to the following steps:

The codebook module is generated by the following steps:

The comparison module stores a plurality of common specific sentence texts, compares the result recognized by the voice recognition module with the specific sentence texts, and judges the recognition accuracy of the voice recognition module.

The camera module highlights the eye focus and lip movement of the user.

The intention judgment module judges the reliability indicating the degree of the intention of the operation when judging that the operation intention exists.

The control device further includes a control state changing module that changes the control state of the controlled device to a direction that is less noticeable to the user than when the intention determination unit determines that there is no operation intention.

The control state changing module changes the control state of the controlled device to a direction that is not recognized by the user when the reliability determined by the intention determining module is low, as compared with a case where the reliability is high.

The control state changing module controls the controlled device to notify the user of the recognition failure when the recognition of the voice uttered by the user fails, and changes a state of the notification to a direction that is not recognized by the user when the reliability of the operation intention of the uttering is low as compared with a case where the reliability is high.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the present specification, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. a speech recognition system based on artificial intelligence algorithm, is characterized in that, comprises:

a user interface for displaying content;

a voice receiving module, the voice receiving module is used for receiving voice signals;

A speech recognition module, the speech recognition module is used for recognizing the speech signal, and the speech recognition module includes:

Signal conversion module, feature extraction module, encoding module, codebook module and operation decoding module;

The feature extraction module is used to perform frame-by-frame processing on the digital signal, extract the feature parameters of the digital signal in each frame, and obtain a feature vector sequence;

The encoding module is used to convert the feature quantity sequence into a feature codeword sequence;

The codebook module stores the probability value of the codeword in the codebook corresponding to each codeword;

The decoding operation module is used to perform decoding operation on the characteristic codeword sequence to obtain the identification result. In the operation, each codeword in the characteristic codeword sequence is directly searched from the codebook module for the codeword with the maximum matching probability, and the decoding operation is obtained. result;

a comparison module, which is used to detect the accuracy of the decoding result;

a camera module, the camera module is used to provide a user's image signal;

An intention judgment module, the intention judgment module judges whether the user has an intention to operate the controlled device according to the image signal captured by the camera module and the recognized sound.

2 . The artificial intelligence algorithm-based speech recognition system according to claim 1 , wherein the codebook is a Gaussian codebook. 3 .

3. the speech recognition system based on artificial intelligence algorithm according to claim 1, is characterized in that, described coding module is as follows according to the step that feature vector sequence is converted into feature code word sequence:

S1: Divide the feature vector sequence into multiple subspaces, each of which corresponds to a codebook;

S2: Calculate the distance metric between all feature vectors in each subspace and each codeword in the corresponding codebook, and use the codeword with the smallest distance metric from the feature vector as the feature vector in the feature codeword sequence corresponding to the feature vector. numbers;

S3: Combine the codewords corresponding to all the vectors in each subspace of the feature vector sequence in the order of the original vectors, that is, to obtain the corresponding feature codeword sequence.

4. the speech recognition system based on artificial intelligence algorithm according to claim 2, is characterized in that, described codebook module is generated by the following steps:

L1: Calculate the mean value and variance vector corresponding to each codeword in the Gaussian codebook;

L2: Utilize above-mentioned mean value and variance vector, calculate each codeword in described characteristic codebook and in Gaussian codebook: the logarithmic probability value that each codeword matches;

L3: The codebook module can be obtained by storing the probability values that all codewords in the feature codebook match all codewords in the Gaussian codebook.

5. the speech recognition system based on artificial intelligence algorithm according to claim 1, is characterized in that, in described contrast module, store a plurality of commonly used specific sentence texts, described contrast module is by the result of speech recognition module recognition and specific sentence text Compare and judge the accuracy of speech recognition module recognition.

6 . The speech recognition system based on an artificial intelligence algorithm according to claim 1 , wherein the camera module focuses on identifying the user's eye focus and lip movement. 7 .

7. The speech recognition system based on artificial intelligence algorithm according to claim 1, is characterized in that, under the situation that described intention judgment module is judged to have operation intention, to express the degree of reliability of the intention of this operation make a judgment.

8. The artificial intelligence algorithm-based speech recognition system according to claim 1, further comprising a control state change module, wherein the control state change module is judged as no operation intention in the case of the intention judgment part , compared with the case where it is determined that there is an operation intention, the state of the control of the controlled device is changed in a direction that the user is not aware of.

9. The speech recognition system based on an artificial intelligence algorithm according to claim 8, wherein the control state change module, when the reliability judged in the intention judgment module is low, is compared with the reliability. Compared with the case of high, the state of the controlled device is changed in a direction that the user is not aware of.

10. The speech recognition system based on an artificial intelligence algorithm according to claim 8, wherein the control state change module controls the controlled device when the recognition of the speech sent by the user fails, causing the controlled device to notify the user of a recognition failure, and in a case where the reliability of the operation intention to utter a sound is low, the state of the notification is changed to not allow the The user is aware of the direction to change.