CN119068898B

CN119068898B - Adaptive noise reduction method based on frequency point gain smoothing and post-filter

Info

Publication number: CN119068898B
Application number: CN202411557581.9A
Authority: CN
Inventors: 周智; 仇健乐; 于欣; 蒋寿美
Original assignee: Time Intelligence Technology Shanghai Co ltd
Current assignee: Time Intelligence Technology Shanghai Co ltd
Priority date: 2024-11-04
Filing date: 2024-11-04
Publication date: 2025-02-07
Anticipated expiration: 2044-11-04
Also published as: CN119068898A

Abstract

The present invention relates to an adaptive noise reduction method and a post-filter based on frequency gain smoothing. The present invention converts a speech signal into a spectrum signal, and then performs gain, and performs smoothing processing on the gain process, and the smoothing processing includes forward smoothing and/or backward smoothing; through the processing mode of forward smoothing and backward smoothing, the frequency gain of speech conversion is smoothed according to the algorithm processing, and the new gain replaces the original gain, so as to obtain smooth and natural frequency data. Accordingly, the present method can improve the naturalness of speech after noise reduction, especially in the scene with low signal-to-noise ratio; the present method significantly reduces the "music noise" caused by excessive noise reduction and spectrum mutation; the present method is a general method, which can be applied to different noise reduction algorithms (models) based on different parameters or variants; the present method is simple to implement and has low computational complexity, and can be combined with other post-filters to achieve better noise reduction and speech preservation effects.

Description

Adaptive noise reduction method based on frequency point gain smoothing and post-filter

Technical Field

The invention relates to a processing technology of audio frequency points, in particular to a technology for avoiding distortion caused by excessively strong gain of the audio frequency points through gain smoothing processing.

Background

The audio signal generally contains noise, and in an audio processing scheme represented by intercom conversation and recording, an adaptive noise reduction (ANS) technology is one of the most widely used technologies. There are different kinds of noise reduction techniques based on different models, and one key problem encountered is the fidelity of speech, i.e. many algorithms can damage speech while reducing noise. As an academic study scheme, a high signal-to-noise ratio can be obtained and is considered as a feasible scheme, but in practical application, music noise can appear even when the signal-to-noise ratio is low because of hearing impairment to voice, which is unacceptable in practical application.

In order to solve the problem, there are some prior art solutions, in which a binary masking method is widely used, in which only 0 or 1 gain is set for each frequency point, that is, it is determined whether it belongs to speech or noise, if it belongs to noise, it is completely removed, and if it belongs to speech, it is completely released. The speech obtained in this way can in some cases improve the above but is not audible enough.

In order to better solve the problem, the method can improve the voice quality on the basis of the gain of the existing noise reduction algorithm according to the adjacent correlation between the frequency points, namely, the gain variation between the adjacent frequency points is not very severe, and meanwhile, the gain mutation like a binary mask mode is avoided.

Disclosure of Invention

The invention mainly aims to provide a method for solving the problem that the gain of a paraphrase audio point is smoothed by an algorithm to avoid abrupt distortion of data.

In order to achieve the above object, the present invention provides an adaptive noise reduction method based on frequency point gain smoothing, which converts a speech signal into a spectrum signal and then performs gain, which performs smoothing processing on a gain process, wherein the smoothing processing includes forward smoothing and/or backward smoothing;

Wherein, forward smoothing:

Taking out ;

For a pair of;

Wherein, Is the gain attenuation coefficient;

the original gain is replaced by a new gain, ;

Wherein, The virtual gain of the frequency point outside the highest frequency is used as an initial point of calculation;

For each of the actual frequency points, For its original gain, this gain is derived by a noise reduction algorithm;

an adjusted gain for the last higher frequency bin;

In addition, backward smoothing:

Taking out ;

For a pair of;

The original gain is replaced by a new gain,;

The virtual gain of the frequency point outside the lowest frequency is used as an initial point of calculation;

For each of the actual frequency points, For its original gain, this gain is derived by a noise reduction algorithm, or from a previous frontal adjustment;

An adjusted gain for the last lower frequency bin.

Preferably, further comprising time smoothing;

Taking out ;

Has the following components;

The original gain is replaced by a new gain,;

Here, theIs the current frequency pointUpper firstIs provided.

Preferably, this is accomplished by the steps of:

preprocessing, namely framing a real-time voice signal, converting the signal into a frequency spectrum through FFT (fast Fourier transform), and obtaining a step 1 and a step 2;

ANS noise reduction, gain is obtained by using a certain noise reduction algorithm, and the step 3 is carried out;

post filtering, namely adjusting the gains, namely steps 4,5 and 6;

Returning to the time domain signal, the spectrum is converted to the time domain by an inverse FFT transformation and synthesized into an audio signal, steps 7 and 8.

Preferably, this is accomplished by the steps of:

step 1, real-time voice signal framing:

the voice signal is a signal stream after device sampling or pre-algorithm processing ;

Each sampling point has a certain number of bits, and samples according to the certain number of bits, normalization is required to be carried out, so that;

In real-time processing, each is takenThe sampling point is one frame, i.eThe frame signal isWherein;

Step 2, speech signalFFT transformed into spectrumWherein, the method comprises the steps of,

Is an analysis window;

is a discrete fourier transform;

Is complex spectrum, in which ,Is the frequency point label;

step 3, adopting a certain signal processing noise reduction algorithm to obtain the gain on each frequency point In general, the number of the cells in a cell,I.e., the gain is real, which means that,For noise reduced spectrum, i.e;

Step 4, namely adopting the forward smoothing step;

step 5, namely adopting the backward smoothing step;

Step 6, namely adopting the time smoothing.

Preferably, the method further comprises the following steps:

and 7, updating the frequency spectrum signal: ;

Step 8: inverse fourier transformed back to the time domain signal: ;

according to the windowing mode, the analysis window used for synthesizing the signal is determined.

Preferably, whenAnd is also provided withWhen a Hanning window is used, the composite window is a unit window;

by Overlap-add mode Synthesizing speech signals by frame。

Preferably, for an audio file at a 16KHz sampling rate, 32ms is taken as one frame, and the 16ms frame is shifted, i.e。

Preferably, the attenuation coefficient in the forward smoothing and the backward smoothingThe values are the same or different;

Attenuation coefficient Adjusting according to signal-to-noise ratio, i.e. when signal-to-noise ratio is highReduced signal to noise ratioThe improvement is carried out;

Replaced by 。

The invention also provides a self-adaptive noise reduction post-filter based on the frequency point gain smoothing, and the post-filter is processed by adopting the noise reduction method.

Preferably, the processing is performed by using independently arranged post-filters at the time of forward smoothing, backward smoothing and time smoothing.

The method has the beneficial effects that through the forward smoothing and backward smoothing processing mode, the frequency point gain of voice conversion is processed according to an algorithm to realize smoothing processing, and the original gain is replaced by the new gain, so that smooth and natural frequency point data are obtained. Correspondingly, the method can improve the naturalness of the noise-reduced voice, particularly in a scene with low signal-to-noise ratio, obviously reduces the music noise caused by excessive noise reduction and frequency spectrum mutation, is a general method, can be applied to different noise reduction algorithms (models) based on different parameters or varieties, is simple to realize and low in calculation complexity, and can be combined with other post-filtering to achieve better noise reduction and voice preservation effects.

Drawings

Fig. 1 is a diagram showing the audio data before and after processing, in which the upper side is a time axis ranging from 25ms to 39ms, and the processing variation of the audio is shown in four areas.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

The principle of the invention is explained as follows:

one voice being a time sequence Can be seen as two tones together, one being speechOne is noiseI.e.,. The purpose of noise reduction or speech separation algorithms is fromHandleAnd (5) calculating.

Because sound is processed little by little and is processed in real time, each time it is faced with a small piece of data, e.g. tens of milliseconds, step 1, framing. And we will typically put the speech on the frequency domain for processing. I.e. step 2, thus corresponding to the noisy speech in the frequency domain,,Corresponding to the spectrum of speech and noise, respectively. The calculation is the content of a noise reduction algorithm, step 3, where it is assumed that the speech and noise are random variables independent of each other, and the signal processing algorithm does not change phase, so that when we estimate a speechMeaning we considerFor example, in general, we let,The power spectrum of the noisy speech, noise, respectively, then,Is a noise gain estimate. In the example of calculation, for example, assuming that the energy of the noisy (original) speech is 100 and the noise energy is 10, one of the simplest estimates is speech energy 90, i.e., gain g=0.9. Step 7 is to apply the gain toThe estimated voice is obtained, and step 8 is to return to the time domain from the frequency domain, so as to obtain the voice stream heard by us at ordinary times.

A practical noise reduction algorithm may need to take into account the speed of noise variation, the accuracy of the noise estimate, etc. (subject to some statistical model), and most will tend to reduce the noise more deeply, so that the noise is almost absent and the speech is also greatly impaired. This is not a problem in the paper, which is to say that the index is rather than the sense of hearing, but in practical use, it is a problem that in practical use, we rather allow some noise and do not have great damage to speech. Thus, what is done in the following step 4/5/6 is that the principle is to preserve speech by preventing abrupt changes in gain. The gain is not reduced but only increased by the basic principle that if the last gain was 0.9, which suddenly became 0.1, it is considered too great, we set a factor such asAs for 0.8, 0.9 is next, and the minimum gain is 0.9×0.8=0.72. And step 4/5/6 is to do this "next" from three directions, namely, going to low frequency, going to high frequency, and going to the next frame, respectively. Thus, the signal will be more stable and the audible sensation will be better.

For each of the actual frequency points,For its original gain, this gain is obtained by a noise reduction algorithm, i.e. the frequency point is obtained by the existing noise reduction algorithmThe noise reduction algorithm is calculated on the frequency point according to the frame iteration, the gain of the adjacent frequency point is not considered, so that the very aggressive gain is obtained, namely, unsmooth or unnatural voice is generated, and the noise reduction algorithm is needed to be applied toSmoothing is performed by the algorithm.

As shown in fig. 1, which is a demonstration of the effect of the data algorithm processed by the method of the present invention, the first channel is the original noise reduction effect, and the second channel is enhanced by the method of the present invention. It can be seen that after the processing of the method herein, many details are recovered in the speech frequency domain, and it can be noted that the lack of fluency (empty space) is somewhat supplemented, such as 28-29 seconds (second region), 31-32 seconds (third region), etc.

Example 1

The embodiment provides a self-adaptive noise reduction method based on frequency point gain smoothing, which is used for converting a voice signal into a frequency spectrum signal and then carrying out gain, wherein the gain process is smoothed by the self-adaptive noise reduction method, and the smoothing process comprises forward smoothing and/or backward smoothing;

wherein, forward smoothing of step 4:

Taking out ;

For a pair of;

Wherein, Is the gain attenuation coefficient, and in the subject scheme,Is a fixed gain attenuation coefficient which is used to determine the gain,If (if)The post-filtering does not affect the original signal,The smaller, the more tends to preserve the original gain,The larger the gain, the smoother;

the original gain is replaced by a new gain, ;

Wherein, For the virtual gain of the frequency bin outside the highest frequency, this frequency bin is not applied in practice, here as an initial point of calculation, and in particular,For the frequency point array, whenWhen the highest frequency point is taken, the algorithm existsThus taking the number ofI.e. the outer frequency point of the highest frequency.

For example, in one specific processing case, for a 16k sample rate signal, a 16ms frame shift, a 32ms frame long data processing,The frequency domain signal range is {0,1,.. The first place, 255}, at this time, the starting point of the back-to-front calculation is 256 which is not within the range, and when one g (256) =0, max {0, g (255) } =g (255), according to the induction method, a value of 254 can be obtained from the value of 255 stepwise, a value of 1 is obtained, and finally a value of 0 is obtained, thus completing the whole calculation process.

Likewise, starting from back-to-front calculation starts from-1, from-1 to 0,0 to 1, and so does smoothing over time from 254 to 255.

For the adjusted gain of the last higher frequency point, a large gap between gains of adjacent frequency points is not desirable for audio quality consideration, and thusAdjusting the smoothing coefficient;

Thereby will Setting the gain of the frequency point after adjustment;

This gain is used for subsequent adjustment, thus setting ;

In addition, step 5 backward smoothing:

Taking out ;

For a pair of;

The original gain is replaced by a new gain,;

The virtual gain of the frequency point outside the lowest frequency is not applied in practice, and is used as the initial point of calculation whenWhen the lowest frequency point 0 is taken, the algorithm existsThus taking the number ofI.e. the outer frequency bin of the lowest frequency. See the description of the "treatment case" above.

For each of the actual frequency points,For its original gain, this gain is derived by a noise reduction algorithm, or from a previous frontal adjustment;

The frequency domain is forward, the frequency domain is backward, the adjustment of the time domain is independent, the input of each adjustment is a set of gains, namely, the functions are nested, and the nesting order of the functions can be adjusted at will.

If the gain of the original noise reduction algorithm is g, the forward filtering algorithm of the frequency domain is Apre, the backward filtering algorithm is Apost, and the time domain filtering is Atime, then the algorithm structure herein can be very flexible, such as Atime (Apre (Apost)) is the order described herein, apre (Apost (Atime (g))) is another order, and so on. Thus, the "forehead adjustment" is achieved by these means.

The noise reduction algorithm is calculated on the frequency point according to the frame iteration, and the gain of the adjacent frequency point is not considered, so that the very aggressive gain is obtained;

For the adjusted gain of the last lower frequency point, a large gap between gains of adjacent frequency points is not desirable for audio quality consideration, and thus Adjusting the smoothing coefficient;

Thereby will Setting the gain of the frequency point after adjustment;

this gain can be used for subsequent adjustment, thus setting ;

Preferably, the method further comprises the step of time smoothing of the step 6;

Taking out ;

Has the following components;

The original gain is replaced by a new gain,;

The gain calculation of the noise reduction algorithm is derived from the estimation of the signal to noise ratio and contains little gain contrast adjustment for the previous and subsequent frames. In view of the fact that there is considerable continuity in the signal between the preceding and following frames, abrupt changes in the signal gain therebetween also tend to be responsible for reduced signal quality or poor hearing, and it is therefore necessary to maintain signal gain smoothness to some extent over adjacent frames.

Similar to the previous description, hereIs the current frequency pointUpper firstIs provided. In particular, the method comprises the steps of,Representing a sequence of frames. That is, from-1 (virtual frame), frame 0 is calculated, then frame 1, then frame 2.

Example 2

On the basis of example 1, this is preferably done by the following steps:

preprocessing, namely framing a real-time voice signal, and converting the signal into a frequency spectrum through FFT (Fourier transform) transformation, wherein the steps are step 1 and step 2;

ANS noise reduction (Automatic Noise Suppression, background noise suppression), gain is obtained by using a certain noise reduction algorithm, which is step 3;

post filtering, namely adjusting the gains, namely steps 4,5 and 6;

Step 1, real-time voice signal framing:

Each sampling point has a certain number of bits, and is sampled according to the certain number of bits, for example, 16-bit sampling, and normalization is performed so that;

In real-time processing, each is takenThe sampling point is one frame, i.eThe frame signal isWhereinPreferably, for an audio file at a 16KHz sampling rate, 32ms is taken as one frame, and the 16ms frame is shifted, i.e. I.e.The number of samples for a frame,The number of samples for the frame shift length.

Is an analysis window;

is a discrete fourier transform;

Is complex spectrum, in which ,Is the frequency point label;

Step 3, obtaining the gain of each frequency point by adopting the existing signal processing noise reduction algorithm In general, the number of the cells in a cell,I.e., the gain is real, which means that,For noise reduced spectrum, i.e;

Step 4, adopting the forward smoothing step in embodiment 1;

Step 5, adopting the backward smoothing step in embodiment 1;

Step 6, the time smoothing step of example 1 was employed.

And 7, updating the frequency spectrum signal:;

Step 8: inverse fourier transformed back to the time domain signal: ;

According to the windowing mode, the analysis window used for synthesizing the signal is determined. Preferably, when And is also provided withWhen a Hanning window is used, the composite window is a unit window;

by Overlap-add mode Synthesizing speech signals by frame。

Example 3

Replaced by Instead of the typical exponential decay approach of atack-Decay, a simple linear decay approach is also more effective in certain situations.

This means that an additional implementation is given. Attack-Decay refers to the original implementation of fast tracking large values (attock), exponentially decaying tracking small values (Decay), and replacing the exponential decay here with a linear decay.

Example 4

The invention also provides a self-adaptive noise reduction post-filter based on frequency point gain smoothing, and the post-filter is processed by adopting the noise reduction method in the embodiments 1-3.

In order to better illustrate the solution of the present invention, some prior art documents are given below.

The following summary contains a description of general steps (e.g., steps 1-3):

Mahdi Parchami, Wei-Ping Zhu, Benoit Champagne, and Eric Plourde, Recent Developments in Speech Enhancement in the Short-Time Fourier Transform Domain , July 2016 IEEE Circuits and Systems Magazine 16(3):45-77

the classical noise reduction algorithm is described below:

Y. Ephraim and D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, no. 6, pp. 1109–1121, December 1984.

a very common noise reduction method is referred to as follows:

Timo Gerkmann, and Richard C. Hendriks, Unbiased MMSE-Based Noise Power Estimation with Low Complexity and Low Tracking Delay [435 citations] May 2012IEEE Transactions on Audio Speech and Language Processing 20(4):1383-1393,

It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Claims

1. The adaptive noise reduction method based on the frequency point gain smoothing converts a voice signal into a frequency spectrum signal and then carries out gain, and is characterized in that the gain process is smoothed, the smoothing process comprises forward smoothing and/or backward smoothing and/or time smoothing, and the adaptive noise reduction method based on the frequency point gain smoothing is completed through the following steps:

step 1, real-time voice signal framing:

Is an analysis window;

is a discrete fourier transform;

Is complex spectrum, in which ,Is the frequency point label;

step 3, obtaining the gain of each frequency point by adopting a signal processing noise reduction algorithm ,I.e., the gain is real, which means that,For noise reduced spectrum, i.e;

Step 4, namely adopting the forward smoothing step;

step 5, namely adopting the backward smoothing step;

step 6, adopting the time smoothing;

Wherein, forward smoothing:

Taking out ;

For a pair of;

Wherein, Is the gain attenuation coefficient;

the original gain is replaced by a new gain, ;

an adjusted gain for the last higher frequency bin;

In addition, backward smoothing:

Taking out ;

For a pair of;

The original gain is replaced by a new gain,;

an adjusted gain for the last lower frequency bin;

In addition, time smoothing:

Taking out ;

Has the following components;

The original gain is replaced by a new gain,;

Here, theIs the current frequency pointUpper firstIs provided.

2. The adaptive noise reduction method based on frequency point gain smoothing of claim 1, wherein the step of:

ANS noise reduction, gain is obtained by using a noise reduction algorithm, and the step 3 is performed;

post filtering, namely adjusting the gains, namely steps 4,5 and 6;

3. The adaptive noise reduction method based on frequency bin gain smoothing of claim 1, further comprising the step of:

and 7, updating the frequency spectrum signal: ;

Step 8: inverse fourier transformed back to the time domain signal: ;

4. The adaptive noise reduction method based on frequency bin gain smoothing as defined in claim 3, wherein whenAnd is also provided withWhen a Hanning window is used, the composite window is a unit window;

by Overlap-add mode Synthesizing speech signals by frame。

5. The adaptive noise reduction method based on frequency point gain smoothing as defined in claim 1, wherein for an audio file with a sampling rate of 16KHz, taking 32ms as a frame, 16ms frame shift is。

6. The adaptive noise reduction method based on frequency bin gain smoothing of claim 1, wherein attenuation coefficients in forward smoothing and backward smoothingThe values are the same or different;

Replaced by 。

7. An adaptive noise reduction post-filter based on frequency bin gain smoothing, characterized in that the post-filter is processed with the noise reduction method according to any of claims 1-6.

8. The adaptive noise reduction post-filter based on frequency bin gain smoothing as claimed in claim 7, wherein the post-filter is independently disposed for processing at the time of forward smoothing, backward smoothing and time smoothing, respectively.