Speech recognition system based on artificial intelligence algorithm
Technical Field
The invention relates to the technical field of voice recognition, in particular to a voice recognition system based on an artificial intelligence algorithm.
Background
With the progress of data processing technology and the rapid spread of mobile internet, computer technology is widely applied to various fields of society, and with the progress of data processing technology, mass data is generated. Among them, voice data is receiving more and more attention. Speech recognition is a cross discipline. Over the last two decades. Speech recognition technology has made significant progress, starting to move from the laboratory to the market. It is expected that voice recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical care, home services, consumer electronics, etc. within the next 10 years. The application of speech recognition dictation machines in some fields is rated by the U.S. news community as one of ten major computer developments in 1997. Many experts consider the speech recognition technology to be one of ten important branch development technologies in the information technology field from 2000 to 2010. The fields to which speech recognition technology relates include: signal processing, pattern recognition, probability and information theory, sound and hearing mechanisms, artificial intelligence, and the like. Speech recognition is technically complex but more widely used than speech synthesis. The greatest advantage of speech recognition is that it makes the human-machine user interface more natural and easy to use.
With the rapid development of microelectronic technology and communication technology, embedded communication devices such as mobile phones and the like almost become articles essential for people to work and live, and the requirements of people on the functions of the embedded communication devices are higher and higher, so that the application of voice technology to the devices becomes a hotspot of research, and the accuracy rate of the existing voice functions is not high; moreover, the voice recognition is always touched by mistake, and the intention of the user to use the voice recognition cannot be judged.
Therefore, there is a need to provide a new speech recognition system based on artificial intelligence algorithm to solve the above technical problems.
Disclosure of Invention
The invention solves the technical problem of providing the voice recognition system based on the artificial intelligence algorithm, which has high recognition accuracy and can judge the operation consciousness of the user.
In order to solve the above technical problems, the speech recognition system based on artificial intelligence algorithm provided by the invention comprises:
a user interface for displaying content;
the voice receiving module is used for receiving a voice signal;
a speech recognition module for recognizing the speech signal, the speech recognition module comprising:
the device comprises a signal conversion module, a feature extraction module, an encoding module, a codebook module and an operation decoding module;
the signal conversion module is used for converting the voice signal into a digital signal;
the characteristic extraction module is used for performing framing processing on the digital signals, extracting characteristic parameters of each frame of the digital signals and obtaining a characteristic vector sequence;
the coding module is used for converting the proper characteristic sequence into a characteristic code word sequence;
the code book module stores the probability value of the code word in the code book corresponding to each code word;
the decoding operation module is used for carrying out decoding operation on the characteristic code word sequence to obtain an identification result, and directly searching a cipher word with the maximum matching probability from the cipher book module for each code word in the characteristic code word sequence in the operation to obtain a decoding result;
the comparison module is used for detecting the accuracy of the decoding result;
the camera module is used for providing an image signal of a user;
and the intention judging module is used for judging whether the user has the intention of operating the controlled device or not according to the image signal shot by the camera module and the recognized voice.
Preferably, the codebook is a gaussian codebook.
Preferably, the encoding module converts the feature vector sequence into a feature codeword sequence according to the following steps:
s1: dividing the feature vector sequence into a plurality of subspaces, wherein each subspace corresponds to a codebook;
s2: calculating distance measurement between all the feature vectors in each subspace and each code word in a corresponding codebook, and taking the code word with the minimum distance measurement with the feature vector as the code word corresponding to the feature vector in the feature code word sequence;
s3: and combining the code words corresponding to all vectors in each subspace of the characteristic vector sequence according to the original vector sequence to obtain the corresponding characteristic code word sequence.
Preferably, the codebook module is generated by the following steps:
l1: calculating a mean value and a variance vector corresponding to each code word in the Gaussian codebook;
l2: calculating the logarithm probability value of each code word in the characteristic codebook and the logarithm probability value of each code word in the Gaussian codebook by using the mean value and the variance vector;
l3: and storing the probability values of all code words in the characteristic codebook matched with all code words in the Gaussian codebook to obtain the codebook module.
Preferably, the comparison module stores a plurality of commonly used specific sentence texts, and compares the result recognized by the speech recognition module with the specific sentence texts to judge the recognition accuracy of the speech recognition module.
Preferably, the camera module highlights the eye focus and lip movement of the user.
Preferably, the intention determination module determines, when it is determined that there is an operation intention, a degree of reliability indicating a degree of intention of the operation.
Preferably, the control device further includes a control state changing module that changes the control state of the controlled device to a direction that is less noticeable to the user than when the intention determination unit determines that there is no operation intention.
Preferably, the control state changing module changes the control state of the controlled device to a direction that is less noticeable to the user than when the reliability determined by the intention determining module is low.
Preferably, the control state changing module controls the controlled device to notify the user of the recognition failure when the recognition of the voice uttered by the user fails, and changes the notified state to a direction that is not recognized by the user when the reliability of the operation intention with respect to the utterance is low as compared with when the reliability is high.
Compared with the related art, the voice recognition system based on the artificial intelligence algorithm has the following beneficial effects:
the invention provides a voice recognition system based on artificial intelligence algorithm, which adds the steps of dynamically merging and splitting subsets according to the vector quantity in the subsets and the total distance measurement of the vectors in the subsets in the process of clustering the voice feature vector set to obtain a codebook, reduces the distance measurement sum of the vectors in the clustered set and the corresponding code words, improves the precision of the clustering algorithm, ensures the recognition performance of the voice system, and greatly reduces the storage capacity of the system; in addition, when it is determined that the user has no operation intention, the control state of the controlled device is changed to a direction that is not recognized by the user, as compared with the case where it is determined that the user has an operation intention, thereby increasing the comfort of use for the user.
Detailed Description
The present invention will be further described with reference to the following embodiments.
An artificial intelligence algorithm based speech recognition system comprising:
a user interface for displaying content;
the voice receiving module is used for receiving a voice signal;
a speech recognition module for recognizing the speech signal, the speech recognition module comprising:
the device comprises a signal conversion module, a feature extraction module, an encoding module, a codebook module and an operation decoding module;
the signal conversion module is used for converting the voice signal into a digital signal;
the characteristic extraction module is used for performing framing processing on the digital signals, extracting characteristic parameters of each frame of the digital signals and obtaining a characteristic vector sequence;
the coding module is used for converting the proper characteristic sequence into a characteristic code word sequence;
the code book module stores the probability value of the code word in the code book corresponding to each code word;
the decoding operation module is used for carrying out decoding operation on the characteristic code word sequence to obtain an identification result, and directly searching a cipher word with the maximum matching probability from the cipher book module for each code word in the characteristic code word sequence in the operation to obtain a decoding result;
the comparison module is used for detecting the accuracy of the decoding result;
the camera module is used for providing an image signal of a user;
and the intention judging module is used for judging whether the user has the intention of operating the controlled device or not according to the image signal shot by the camera module and the recognized voice.
The codebook is a gaussian codebook.
The encoding module converts the feature vector sequence into a feature codeword sequence according to the following steps:
s1: dividing the feature vector sequence into a plurality of subspaces, wherein each subspace corresponds to a codebook;
s2: calculating distance measurement between all the feature vectors in each subspace and each code word in a corresponding codebook, and taking the code word with the minimum distance measurement with the feature vector as the code word corresponding to the feature vector in the feature code word sequence;
s3: and combining the code words corresponding to all vectors in each subspace of the characteristic vector sequence according to the original vector sequence to obtain the corresponding characteristic code word sequence.
The codebook module is generated by the following steps:
l1: calculating a mean value and a variance vector corresponding to each code word in the Gaussian codebook;
l2: calculating the logarithm probability value of each code word in the characteristic codebook and the logarithm probability value of each code word in the Gaussian codebook by using the mean value and the variance vector;
l3: and storing the probability values of all code words in the characteristic codebook matched with all code words in the Gaussian codebook to obtain the codebook module.
The comparison module stores a plurality of common specific sentence texts, compares the result recognized by the voice recognition module with the specific sentence texts, and judges the recognition accuracy of the voice recognition module.
The camera module highlights the eye focus and lip movement of the user.
The intention judgment module judges the reliability indicating the degree of the intention of the operation when judging that the operation intention exists.
The control device further includes a control state changing module that changes the control state of the controlled device to a direction that is less noticeable to the user than when the intention determination unit determines that there is no operation intention.
The control state changing module changes the control state of the controlled device to a direction that is not recognized by the user when the reliability determined by the intention determining module is low, as compared with a case where the reliability is high.
The control state changing module controls the controlled device to notify the user of the recognition failure when the recognition of the voice uttered by the user fails, and changes a state of the notification to a direction that is not recognized by the user when the reliability of the operation intention of the uttering is low as compared with a case where the reliability is high.
Compared with the related art, the voice recognition system based on the artificial intelligence algorithm has the following beneficial effects:
the invention provides a voice recognition system based on artificial intelligence algorithm, which adds the steps of dynamically merging and splitting subsets according to the vector quantity in the subsets and the total distance measurement of the vectors in the subsets in the process of clustering the voice feature vector set to obtain a codebook, reduces the distance measurement sum of the vectors in the clustered set and the corresponding code words, improves the precision of the clustering algorithm, ensures the recognition performance of the voice system, and greatly reduces the storage capacity of the system; in addition, when it is determined that the user has no operation intention, the control state of the controlled device is changed to a direction that is not recognized by the user, as compared with the case where it is determined that the user has an operation intention, thereby increasing the comfort of use for the user.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the present specification, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.