CN112530460A - Voice enhancement quality evaluation method, device, terminal and storage medium - Google Patents

Voice enhancement quality evaluation method, device, terminal and storage medium Download PDF

Info

Publication number
CN112530460A
CN112530460A CN202011376869.8A CN202011376869A CN112530460A CN 112530460 A CN112530460 A CN 112530460A CN 202011376869 A CN202011376869 A CN 202011376869A CN 112530460 A CN112530460 A CN 112530460A
Authority
CN
China
Prior art keywords
signal
noise
speech
voice
enhancement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011376869.8A
Other languages
Chinese (zh)
Inventor
方泽煌
康元勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yealink Network Technology Co Ltd
Original Assignee
Xiamen Yealink Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yealink Network Technology Co Ltd filed Critical Xiamen Yealink Network Technology Co Ltd
Priority to CN202011376869.8A priority Critical patent/CN112530460A/en
Publication of CN112530460A publication Critical patent/CN112530460A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明公开了一种语音增强质量评估方法、装置、终端及存储介质,利用干净语音信号作为原始信号,在进行语音增强之前叠加不同类型的噪声生成带噪语音信号,接着通过语音增强算法处理生成语音增强信号,最后将干净的原始语音信号和语音增强信号导入PESQ中得到语音增强质量评估分数,该方法可以任意地模拟不同场景的带噪语音信号,实现在大规模的场景测试中更加灵活方便地进行语音增强质量评估。

Figure 202011376869

The invention discloses a speech enhancement quality evaluation method, device, terminal and storage medium. The clean speech signal is used as the original signal, different types of noise are superimposed before speech enhancement to generate a noisy speech signal, and then the speech enhancement algorithm is used to process the generated speech signal. Speech enhancement signal, and finally the clean original speech signal and speech enhancement signal are imported into PESQ to obtain the speech enhancement quality evaluation score. This method can arbitrarily simulate the noisy speech signal of different scenarios, which is more flexible and convenient in large-scale scenario testing. to evaluate the quality of speech enhancement.

Figure 202011376869

Description

Voice enhancement quality evaluation method, device, terminal and storage medium
Technical Field
The present invention relates to the field of voice communication technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for evaluating voice enhancement quality.
Background
With the development of conference communication and VoIP communication, users have higher and higher requirements for the quality of the voice signal of the conference terminal, and therefore, it has become an existing standard to provide the conference terminal with voice enhancement related technology.
In order to test the voice quality after voice enhancement, a terminal voice quality evaluation method and a terminal are disclosed in the prior art, wherein the method comprises the following steps: the terminal acquires a source standard sound source signal according to the voice quality type to be evaluated; the terminal processes the source standard sound source signal according to the voice quality type to be evaluated; the terminal leads the source standard sound source signal and the processed source standard sound source signal into a PESQ algorithm, and the voice quality is evaluated through the algorithm; the voice quality types include: terminal uplink voice quality, terminal downlink voice quality, and voice quality of a terminal voice channel. The invention realizes that the terminal independently and simply evaluates the voice quality of the terminal.
However, the inventor finds that the prior art has the following defects: the method has complicated steps and is inconvenient to be applied to large-scale scene tests because the specified standard sound source signal is required to be used in each test.
Disclosure of Invention
The invention aims to provide a voice enhancement quality evaluation method, a device, a terminal and a storage medium, wherein a clean voice signal is used as an original signal, different types of noise are superposed before voice enhancement is carried out to generate a noisy voice signal, then the voice enhancement signal is generated through processing of a voice enhancement algorithm, and finally the clean original voice signal and the voice enhancement signal are led into PESQ to obtain a voice enhancement quality evaluation score.
In a first aspect, an embodiment of the present invention provides a method for evaluating speech enhancement quality, where the method includes the following steps:
the data acquisition step comprises: acquiring clean voice signals and noise data;
generating a noisy speech signal: processing the clean voice signal and the noise data according to preset evaluation content to generate a voice signal with noise; wherein the evaluation content includes a generation scenario of noise data and a signal-to-noise ratio of a noisy speech signal;
generating a speech enhancement signal: processing the noisy speech signal using a speech enhancement algorithm to generate a speech enhancement signal;
the step of evaluating the speech enhancement quality: and importing the clean voice signal and the voice enhancement signal into PESQ to calculate and obtain a voice enhancement quality score.
As a further improvement of the first aspect of the present invention, the step of generating a noisy speech signal specifically includes the steps of:
selecting a plurality of pieces of noise data to be superposed to generate a noise signal;
calculating a scaling coefficient of a corresponding noise signal according to the signal-to-noise ratio of the voice signal with the noise;
and carrying out scaling processing on the noise signal according to the scaling coefficient, and superposing the noise signal after scaling processing and the clean voice signal to generate a voice signal with noise.
As a further improvement of the first aspect of the present invention, the calculation formula of the scaling factor is as follows:
Figure BDA0002808623220000021
wherein alpha isnoiseRepresenting the scaling factor of the noise signal and snr representing the signal-to-noise ratio of the noisy speech signal.
As a further improvement of the first aspect of the present invention, the calculation formula for generating the noisy speech signal is as follows:
noisy(x)=speech(x)+αnoise*noise(x)
here, noise (x) represents a noisy speech signal, speech (x) represents a clean speech signal, and noise (x) represents a noise signal.
As a further improvement of the first aspect of the present invention, the acquiring noise data specifically includes the following steps:
collecting noise data by recording or downloading a source database through a network;
and generating a noise library according to the collected noise data.
In a second aspect, an embodiment of the present invention provides a speech enhancement quality assessment apparatus, where the apparatus includes:
the data acquisition module is used for acquiring clean voice signals and noise data;
the noisy speech signal generation module is used for processing the clean speech signal and the noise data according to preset evaluation content to generate a noisy speech signal; wherein the evaluation content includes a generation scenario of noise data and a signal-to-noise ratio of a noisy speech signal;
the voice enhancement signal generation module is used for processing the voice signal with noise by utilizing a voice enhancement algorithm to generate a voice enhancement signal;
and the voice enhancement quality evaluation module is used for guiding the clean voice signal and the voice enhancement signal into PESQ to calculate and obtain a voice enhancement quality score.
As a further development of the second aspect of the invention, the noisy speech signal generation module comprises the following subunits:
the noise signal generation subunit is used for selecting a plurality of pieces of noise data to be superposed so as to generate a noise signal;
the scaling coefficient calculation subunit is used for calculating the scaling coefficient of the corresponding noise signal according to the signal-to-noise ratio of the voice signal with noise;
and the noisy speech signal generating subunit is used for carrying out scaling processing on the noise signal according to the scaling coefficient and superposing the scaled noise signal and the clean speech signal to generate a noisy speech signal.
As a further development of the second aspect of the invention, the data acquisition module comprises a noise data acquisition sub-module for acquiring noise data, the noise data acquisition sub-module comprising the following sub-units:
the acquisition subunit is used for acquiring the noise data in a mode of recording or downloading a source database by a network;
and the noise library generating subunit is used for generating a noise library according to the collected noise data.
In a third aspect, an embodiment of the present invention provides a speech enhancement quality assessment terminal, including: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the speech enhancement quality assessment method according to any of the embodiments of the first aspect of the present invention when executing the program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to cause a computer to execute a speech enhancement quality assessment method according to any embodiment of the first aspect of the present invention.
Compared with the prior art, the embodiment of the invention at least has the following beneficial effects:
1. the voice enhancement quality evaluation method, the voice enhancement quality evaluation device, the voice enhancement quality evaluation terminal and the storage medium provided by the invention can meet the voice source with any signal-to-noise ratio, and can finish the voice enhancement quality evaluation work only by using the conference terminal without the cooperation of additional equipment (such as a server).
2. According to the voice enhancement quality evaluation method, the voice enhancement quality evaluation device, the voice enhancement quality evaluation terminal and the storage medium, the clean voice signal is used as the original signal, different types of noise are superposed before voice enhancement is carried out to generate the voice signal with noise, then the voice enhancement signal is processed through the voice enhancement algorithm to generate the voice enhancement signal, and finally the clean original voice signal and the voice enhancement signal are led into the PESQ to obtain the voice enhancement quality evaluation score.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The invention is further described below with reference to the accompanying drawings and examples;
fig. 1 is a block diagram of a terminal for performing a speech enhancement quality assessment method according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating an example of a speech enhancement quality evaluation method according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating another example of a speech enhancement quality evaluation method according to an embodiment of the present invention.
Fig. 4 is a block diagram of a speech enhancement quality evaluation apparatus according to an embodiment of the present invention.
FIG. 5 is a block diagram of a computer device in one embodiment.
Detailed Description
Reference will now be made in detail to the present preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout.
Fig. 1 is a block diagram of a terminal for performing a speech enhancement quality assessment method according to an embodiment. Referring to fig. 1, the voice enhancement quality assessment method is applied to a conference terminal 110. The conference terminal 110 is a computer device with data transmission and processing capabilities, and the conference terminal 110 is electrically connected with a microphone 120 and an analog-to-digital converter 130 for converting acoustic signals in the environment into digital signals suitable for calculation. It should be noted that the speech enhancement quality evaluation method may also be applied to other terminals, such as a desktop terminal or a mobile terminal, where the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like.
To facilitate understanding of the present invention by those skilled in the art, technical terms related to embodiments of the present invention are briefly described below.
An ADC, i.e., an Analog-to-digital converter (english) is a class of devices used to convert a continuous signal in Analog form into a discrete signal in digital form. An analog to digital converter may provide the signal for measurement. The opposite device becomes a digital-to-analog converter.
The microphone, known as a microphone, is translated from an english microphone (microphone), and is also called a microphone or a microphone. A microphone is an energy conversion device that converts a sound signal into an electrical signal.
The signal-to-noise ratio, known by the english name SNR or S/N, refers to the ratio of signal to noise in an electronic device or system. In a narrow sense, the ratio of the power of the output signal of the amplifier to the power of the noise output at the same time, often expressed in decibels, a higher signal-to-noise ratio of the device indicates that it generates less noise.
PESQ, objective speech quality assessment, ITU-T (international telecommunication union, telecommunication standardization sector) p.862 recommendation provides an objective MOS value evaluation method.
The speech enhancement quality evaluation method provided by the embodiment of the present invention will be described and explained in detail by several specific embodiments.
In one embodiment, as shown in FIG. 2, a speech enhancement quality assessment method is provided. The embodiment is mainly illustrated by applying the method to computer equipment. The computer device may specifically be the conference terminal 110 in fig. 1 described above.
Referring to fig. 2, in this embodiment, the speech enhancement quality evaluation method includes the following steps:
step S102, data acquisition step: conference terminal 110 acquires clean voice signals and noise data.
It should be noted that the acquiring of the noise data specifically includes the following steps:
step a: the conference terminal 110 collects noise data by recording or downloading a source database through a network;
step b: the conference terminal 110 generates a noise library from the collected noise data.
It is understood that, as an example, the voice enhancement quality scoring under the actual application environment can be realized by recording the noise of the environment through the microphone 120 shown in fig. 1.
In another example, noise data can be collected by downloading a source database over a network, and the method uses the source database over the network to test the voice enhancement quality under various application environments or scenes, thereby improving the testing efficiency.
Step S104, generating a noisy speech signal: the conference terminal 110 processes the clean voice signal and the noise data according to a preset evaluation content to generate a voice signal with noise; wherein the evaluation content includes a generation scenario of the noise data and a signal-to-noise ratio of the noisy speech signal.
It should be noted that the step of generating a noisy speech signal specifically includes the following steps:
step S1041: the conference terminal 110 selects a plurality of pieces of noise data to be superimposed to generate a noise signal;
step S1042: the conference terminal 110 calculates a scaling coefficient of a corresponding noise signal according to the signal-to-noise ratio of the voice signal with noise;
the calculation formula of the scaling factor is as follows:
Figure BDA0002808623220000061
wherein alpha isnoiseRepresenting the scaling factor of the noise signal and snr representing the signal-to-noise ratio of the noisy speech signal.
Step S1043: the conference terminal 110 performs scaling processing on the noise signal according to the scaling coefficient, and superimposes the scaled noise signal with the clean voice signal to generate a noisy voice signal.
The calculation formula for generating the noisy speech signal is as follows:
noisy(x)=speech(x)+αnoise*noise(x)
here, noise (x) represents a noisy speech signal, speech (x) represents a clean speech signal, and noise (x) represents a noise signal.
It is understood that after the noise data of various scenes is acquired through step S102, the signal-to-noise ratio of the noisy speech signal can be arbitrarily set through step S104, facilitating large-scale speech enhancement quality assessment under various scenes
Step S106, generating a speech enhancement signal: conference terminal 110 processes the noisy speech signal using a speech enhancement algorithm to generate a speech enhancement signal;
step S108, evaluating the voice enhancement quality: and the conference terminal 110 guides the clean voice signal and the voice enhancement signal into PESQ to calculate a voice enhancement quality score.
In summary, the speech enhancement quality assessment method in this embodiment uses a clean speech signal as an original signal, superimposes different types of noise to generate a noisy speech signal before performing speech enhancement, then processes the noisy speech signal through a speech enhancement algorithm to generate a speech enhancement signal, and finally introduces the clean original speech signal and the speech enhancement signal into PESQ to obtain a speech enhancement quality assessment score.
In a preferred embodiment, a speech enhancement quality assessment method is provided, comprising the following steps
Step one, preparing a clean voice signal, and acquiring the voice signal with the signal-to-noise ratio higher than 60dB and the sampling rate of 16 kHz.
Step two, adding a noise signal, specifically comprising the following four substeps:
substep 1: acquiring a noise library, wherein in the embodiment, the noise library is obtained by recording and downloading a source database on the internet, and different noise libraries can be prepared for different test scenes;
substep 2: selecting noise, randomly selecting N pieces of noise data for superposition, wherein the superposition formula is as follows: alpha is alphanoise
noise(x)=α0noise0(x)+α1noise2(x)+…+αnnoiseN(x)
Where noise (x) represents the noise signal after superposition, αnRepresenting a scaling parameter, αnIn the range of (0,1)And (4) taking values.
Substep 3: calculating a noise scaling coefficient, calculating a specified signal-to-noise ratio according to the amplitude of the clean voice signal, in the embodiment, randomly taking a value between the signal-to-noise ratios (-15dB,20dB), and solving the scaling coefficient of the noise signal according to the known signal-to-noise ratio.
The signal-to-noise ratio calculation formula is as follows:
Figure BDA0002808623220000081
wherein T represents the number of signal samples of the signal, and the noise scaling factor formula is calculated as follows according to the formula:
Figure BDA0002808623220000082
αnoiserepresenting the scaling factor of the noise signal and snr representing the signal-to-noise ratio of the noisy speech signal.
Substep 4: generating a noisy speech signal
Generating the noisy speech according to the noise scaling factor calculated in substep 3, the calculation formula being as follows:
noisy(x)=speech(x)+αnoise*noise(x)
here, noise (x) represents a noisy speech signal, speech (x) represents a clean speech signal, and noise (x) represents a noise signal.
Step three: and inputting the noisy speech signal generated in the step three into any speech enhancement algorithm for processing to obtain a speech enhancement signal.
Note that, speech enhancement, english name: speech Enhancement, whose essence is Speech noise reduction, in other words, in daily life, the Speech collected by the microphone 120 is usually "polluted" Speech with different noises, and the main purpose of Speech Enhancement is to recover the desired clean Speech from these "polluted" noisy Speech. Speech enhancement algorithms can be divided into two broad categories: the speech enhancement method based on digital signal processing and the speech enhancement method based on machine learning are not described herein since the speech enhancement algorithm is well known in the art.
And step four, importing the clean voice signal and the voice enhancement signal into PESQ to calculate and obtain a voice enhancement quality score, wherein the PESQ is used as a known general open source technology and is not described herein again.
Referring to fig. 4, an embodiment of the present invention provides a speech enhancement quality assessment apparatus, including:
a data acquisition module 201, configured to acquire clean voice signals and noise data;
a noisy speech signal generating module 202, configured to process the clean speech signal and the noise data according to a preset evaluation content to generate a noisy speech signal; wherein the evaluation content includes a generation scenario of noise data and a signal-to-noise ratio of a noisy speech signal;
a speech enhancement signal generation module 203, configured to process the noisy speech signal by using a speech enhancement algorithm to generate a speech enhancement signal;
and the voice enhancement quality evaluation module 204 is configured to import the clean voice signal and the voice enhancement signal into PESQ to calculate a voice enhancement quality score.
Specifically, the noisy speech signal generating module 202 includes the following sub-units:
the noise signal generation subunit is used for selecting a plurality of pieces of noise data to be superposed so as to generate a noise signal;
the scaling coefficient calculation subunit is used for calculating the scaling coefficient of the corresponding noise signal according to the signal-to-noise ratio of the voice signal with noise;
and the noisy speech signal generating subunit is used for carrying out scaling processing on the noise signal according to the scaling coefficient and superposing the scaled noise signal and the clean speech signal to generate a noisy speech signal.
Further, the data acquisition module 201 includes a noise data acquisition sub-module for acquiring noise data, the noise data acquisition sub-module including the following sub-units:
the acquisition subunit is used for acquiring the noise data in a mode of recording or downloading a source database by a network;
and the noise library generating subunit is used for generating a noise library according to the collected noise data.
In summary, the speech enhancement quality assessment apparatus in this embodiment uses a clean speech signal as an original signal, superimposes different types of noise to generate a noisy speech signal before performing speech enhancement, then processes the noisy speech signal through a speech enhancement algorithm to generate a speech enhancement signal, and finally introduces the clean original speech signal and the speech enhancement signal into PESQ to obtain a speech enhancement quality assessment score.
It should be noted that the device embodiment and the method embodiment of the present invention are based on the same inventive concept, and are not described herein again for the device embodiment.
FIG. 5 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the conference terminal 110 in fig. 1. As shown in fig. 5, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the speech enhancement quality assessment method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a speech enhancement quality assessment method. Those skilled in the art will appreciate that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing devices to which aspects of the present invention may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the speech enhancement quality assessment apparatus provided by the present application may be implemented in the form of a computer program that is executable on a computer device such as that shown in fig. 5. The memory of the computer device may store various program modules constituting the speech enhancement quality evaluation apparatus, such as a data acquisition module 201, a noisy speech signal generation module 202, a speech enhancement signal generation module 203, and a speech enhancement quality evaluation module 204 shown in fig. 4. The respective program modules constitute computer programs that cause a processor to execute the steps in the speech enhancement quality evaluation method of the respective embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 5 may perform the step of acquiring clean speech signal and noise data by the data acquisition module 201 in the speech enhancement quality assessment apparatus shown in fig. 4. The step of processing the clean speech signal and the noise data according to a preset evaluation content to generate a noisy speech signal is performed by a noisy speech signal generating module 202, wherein the evaluation content includes a generation scene of the noise data and a signal-to-noise ratio of the noisy speech signal. The step of processing the noisy speech signal with a speech enhancement algorithm to generate a speech enhancement signal is performed by a speech enhancement signal generation module 203. The step of importing the clean speech signal and the speech enhancement signal into PESQ to calculate a speech enhancement quality score is performed by a speech enhancement quality evaluation module 204.
In one embodiment, there is provided a conference terminal including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to perform the steps of the above-described speech enhancement quality assessment method. Here, the steps of the speech enhancement quality evaluation method may be the steps in the speech enhancement quality evaluation methods of the above-described respective embodiments.
In one embodiment, a computer-readable storage medium is provided, which stores computer-executable instructions for causing a computer to perform the steps of the above-described speech enhancement quality assessment method. Here, the steps of the speech enhancement quality evaluation method may be the steps in the speech enhancement quality evaluation methods of the above-described respective embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRA), Rambus Direct RAM (RDRA), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

Claims (10)

1. A speech enhancement quality assessment method, characterized in that it comprises the steps of:
the data acquisition step comprises: acquiring clean voice signals and noise data;
generating a noisy speech signal: processing the clean voice signal and the noise data according to preset evaluation content to generate a voice signal with noise; wherein the evaluation content includes a generation scenario of noise data and a signal-to-noise ratio of a noisy speech signal;
generating a speech enhancement signal: processing the noisy speech signal using a speech enhancement algorithm to generate a speech enhancement signal;
the step of evaluating the speech enhancement quality: and importing the clean voice signal and the voice enhancement signal into PESQ to calculate and obtain a voice enhancement quality score.
2. The method according to claim 1, wherein the step of generating the noisy speech signal comprises the steps of:
selecting a plurality of pieces of noise data to be superposed to generate a noise signal;
calculating a scaling coefficient of a corresponding noise signal according to the signal-to-noise ratio of the voice signal with the noise;
and carrying out scaling processing on the noise signal according to the scaling coefficient, and superposing the noise signal after scaling processing and the clean voice signal to generate a voice signal with noise.
3. The speech enhancement quality assessment method according to claim 2, wherein said scaling factor is calculated as follows:
Figure FDA0002808623210000011
wherein alpha isnoiseRepresenting the scaling factor of the noise signal and snr representing the signal-to-noise ratio of the noisy speech signal.
4. The speech enhancement quality estimation method according to claim 3, wherein the calculation formula for generating the noisy speech signal is as follows:
noisy(x)=speech(x)+αnoise*noise(x)
here, noise (x) represents a noisy speech signal, speech (x) represents a clean speech signal, and noise (x) represents a noise signal.
5. The method of claim 1, wherein the obtaining noise data comprises:
collecting noise data by recording or downloading a source database through a network;
and generating a noise library according to the collected noise data.
6. A speech enhancement quality assessment apparatus, characterized in that the apparatus comprises:
the data acquisition module is used for acquiring clean voice signals and noise data;
the noisy speech signal generation module is used for processing the clean speech signal and the noise data according to preset evaluation content to generate a noisy speech signal; wherein the evaluation content includes a generation scenario of noise data and a signal-to-noise ratio of a noisy speech signal;
the voice enhancement signal generation module is used for processing the voice signal with noise by utilizing a voice enhancement algorithm to generate a voice enhancement signal;
and the voice enhancement quality evaluation module is used for guiding the clean voice signal and the voice enhancement signal into PESQ to calculate and obtain a voice enhancement quality score.
7. The speech enhancement quality assessment apparatus according to claim 6, wherein said noisy speech signal generation module comprises the following sub-units:
the noise signal generation subunit is used for selecting a plurality of pieces of noise data to be superposed so as to generate a noise signal;
the scaling coefficient calculation subunit is used for calculating the scaling coefficient of the corresponding noise signal according to the signal-to-noise ratio of the voice signal with noise;
and the noisy speech signal generating subunit is used for carrying out scaling processing on the noise signal according to the scaling coefficient and superposing the scaled noise signal and the clean speech signal to generate a noisy speech signal.
8. The speech enhancement quality assessment apparatus according to claim 6, wherein said data acquisition module comprises a noise data acquisition sub-module for acquiring noise data, said noise data acquisition sub-module comprising the sub-units of:
the acquisition subunit is used for acquiring the noise data in a mode of recording or downloading a source database by a network;
and the noise library generating subunit is used for generating a noise library according to the collected noise data.
9. A speech enhancement quality assessment terminal, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the speech enhancement quality assessment method according to any of claims 1 to 5 when executing the program.
10. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the speech enhancement quality assessment method according to any one of claims 1 to 5.
CN202011376869.8A 2020-11-30 2020-11-30 Voice enhancement quality evaluation method, device, terminal and storage medium Pending CN112530460A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011376869.8A CN112530460A (en) 2020-11-30 2020-11-30 Voice enhancement quality evaluation method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011376869.8A CN112530460A (en) 2020-11-30 2020-11-30 Voice enhancement quality evaluation method, device, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN112530460A true CN112530460A (en) 2021-03-19

Family

ID=74995557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011376869.8A Pending CN112530460A (en) 2020-11-30 2020-11-30 Voice enhancement quality evaluation method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN112530460A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694683A (en) * 2022-05-09 2022-07-01 北京达佳互联信息技术有限公司 Speech enhancement evaluation method, and training method and device of speech enhancement evaluation model
CN115346518A (en) * 2022-07-05 2022-11-15 科大讯飞股份有限公司 Voice simulation signal acquisition method, voice recognition method, voice simulation signal acquisition device, voice recognition device, voice simulation equipment and voice simulation medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080255834A1 (en) * 2004-09-17 2008-10-16 France Telecom Method and Device for Evaluating the Efficiency of a Noise Reducing Function for Audio Signals
CN101710490A (en) * 2009-11-20 2010-05-19 安徽科大讯飞信息科技股份有限公司 Method and device for compensating noise for voice assessment
CN110222781A (en) * 2019-06-12 2019-09-10 成都嗨翻屋科技有限公司 Audio denoising method, device, user terminal and storage medium
CN110491406A (en) * 2019-09-25 2019-11-22 电子科技大学 A kind of multimode inhibits double noise speech Enhancement Methods of variety classes noise
CN110517708A (en) * 2019-09-02 2019-11-29 平安科技(深圳)有限公司 A kind of audio-frequency processing method, device and computer storage medium
CN110600022A (en) * 2019-08-12 2019-12-20 平安科技(深圳)有限公司 Audio processing method and device and computer storage medium
CN110853664A (en) * 2019-11-22 2020-02-28 北京小米移动软件有限公司 Method, apparatus and electronic device for evaluating the performance of speech enhancement algorithm

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080255834A1 (en) * 2004-09-17 2008-10-16 France Telecom Method and Device for Evaluating the Efficiency of a Noise Reducing Function for Audio Signals
CN101710490A (en) * 2009-11-20 2010-05-19 安徽科大讯飞信息科技股份有限公司 Method and device for compensating noise for voice assessment
CN110222781A (en) * 2019-06-12 2019-09-10 成都嗨翻屋科技有限公司 Audio denoising method, device, user terminal and storage medium
CN110600022A (en) * 2019-08-12 2019-12-20 平安科技(深圳)有限公司 Audio processing method and device and computer storage medium
CN110517708A (en) * 2019-09-02 2019-11-29 平安科技(深圳)有限公司 A kind of audio-frequency processing method, device and computer storage medium
CN110491406A (en) * 2019-09-25 2019-11-22 电子科技大学 A kind of multimode inhibits double noise speech Enhancement Methods of variety classes noise
CN110853664A (en) * 2019-11-22 2020-02-28 北京小米移动软件有限公司 Method, apparatus and electronic device for evaluating the performance of speech enhancement algorithm

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694683A (en) * 2022-05-09 2022-07-01 北京达佳互联信息技术有限公司 Speech enhancement evaluation method, and training method and device of speech enhancement evaluation model
CN114694683B (en) * 2022-05-09 2025-04-11 北京达佳互联信息技术有限公司 Speech enhancement evaluation method, speech enhancement evaluation model training method and device
CN115346518A (en) * 2022-07-05 2022-11-15 科大讯飞股份有限公司 Voice simulation signal acquisition method, voice recognition method, voice simulation signal acquisition device, voice recognition device, voice simulation equipment and voice simulation medium

Similar Documents

Publication Publication Date Title
Cutler et al. ICASSP 2023 acoustic echo cancellation challenge
Sridhar et al. ICASSP 2021 acoustic echo cancellation challenge: Datasets, testing framework, and results
JP6903611B2 (en) Signal generators, signal generators, signal generators and programs
CN111133507B (en) Speech synthesis method, device, intelligent terminal and readable medium
CN110223680A (en) Voice processing method, voice recognition device, voice recognition system and electronic equipment
Harte et al. TCD-VoIP, a research database of degraded speech for assessing quality in VoIP applications
JP7615510B2 (en) Speech enhancement method, speech enhancement device, electronic device, and computer program
US20240194214A1 (en) Training method and enhancement method for speech enhancement model, apparatus, electronic device, storage medium and program product
WO2008110870A2 (en) Speech coding system and method
CN105656931B (en) Method and device for objectively evaluating and processing voice quality of network telephone
Dantas Nunes et al. Performance improvement of a non‐intrusive voice quality metric in lossy networks
CN111429931B (en) Noise reduction model compression method and device based on data enhancement
CN115101082B (en) Speech enhancement method, device, equipment, storage medium and program product
CN114333892B (en) A voice processing method, device, electronic device and readable medium
CN112530460A (en) Voice enhancement quality evaluation method, device, terminal and storage medium
CN115565543A (en) Single-channel voice echo cancellation method and device based on deep neural network
CN109273010B (en) Voice data processing method and device, computer equipment and storage medium
Shen et al. MSQAT: A multi-dimension non-intrusive speech quality assessment transformer utilizing self-supervised representations
CN111613211A (en) Method and device for processing specific word voice
CN112562740B (en) Noise elimination test method, system, audio and video equipment and storage medium
Fingscheidt et al. Quality assessment of speech enhancement systems by separation of enhanced speech, noise, and echo.
US11322173B2 (en) Evaluation of speech quality in audio or video signals
JP6707914B2 (en) Gain processing device and program, and acoustic signal processing device and program
Das et al. Evaluation of perceived speech quality for VoIP codecs under different loudness and background noise condition
JP5952252B2 (en) Call quality estimation method, call quality estimation device, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210319

RJ01 Rejection of invention patent application after publication