CN119418712A

CN119418712A - A noise reduction method for real-time speech at the edge

Info

Publication number: CN119418712A
Application number: CN202510018663.4A
Authority: CN
Inventors: 敬运腾; 陈瀚轩; 王诗音
Original assignee: Xi'an Saipute Information Technology Co ltd
Current assignee: Xi'an Saipute Information Technology Co ltd
Priority date: 2025-01-07
Filing date: 2025-01-07
Publication date: 2025-02-11

Abstract

The application belongs to the technical field of voice noise reduction. The application provides a noise reduction method for real-time voice at an edge. The disclosed embodiments perform pre-emphasis, framing, and windowing on an input speech signal to improve the high frequency content of the signal and reduce distortion of the signal. The preprocessed input speech signal is subjected to a fast fourier transform to convert the time domain signal into an original frequency domain signal. And calculating the Mel frequency cepstrum coefficient of the original frequency domain signal to extract the MFCC characteristic parameters of the voice. The noise estimation is carried out on the MFCC characteristic parameters by utilizing the light separable convolutional neural network, so that the parameter quantity and the calculated quantity are reduced, and the efficient noise estimation is realized. And performing adaptive iterative wiener filtering by using the generated noise estimation, and gradually eliminating noise in the voice signal to obtain a filtered frequency domain signal. And performing inverse fast Fourier transform on the filtered frequency domain signal to recover the time domain signal. And outputting a time domain signal, namely the noise-reduced voice signal.

Description

Noise reduction method for edge-end real-time voice

Technical Field

The embodiment of the disclosure relates to the technical field of voice noise reduction, in particular to a noise reduction method for real-time voice at an edge end.

Background

With the rapid development of information technology, voice communication and voice recognition technologies are widely used in various fields. In particular, in smart phones, smart assistants, teleconferencing, and other scenarios, clear speech signal transmission and processing are critical. However, in practical applications, the speech signal is often subjected to various noise, which may come from the environment (e.g., traffic noise, crowd noise, etc.) or the device itself (e.g., poor microphone quality, unstable signal transmission, etc.). These noises can significantly reduce the clarity and intelligibility of the speech signal and even lead to information delivery errors.

At present, the noise reduction of the voice is mainly divided into noise reduction of a traditional filter and noise reduction based on deep learning. The traditional filter denoising method is small in calculated amount and suitable for an offline real-time denoising scene of an edge end, but has poor denoising effect, particularly non-stationary noise is difficult to filter, and the denoising scheme based on deep learning is excellent in denoising effect, but has huge calculated amount of a model, and is difficult to deploy and use in the edge end scene with limited power consumption and calculation resources.

Accordingly, there is a need to improve one or more problems in the related art as described above.

It is noted that this section is intended to provide a background or context for the technical solutions of the present disclosure as set forth in the claims. The description herein is not admitted to be prior art by inclusion in this section.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a noise reduction method for edge-side real-time voice, which overcomes one or more of the problems due to the limitations and disadvantages of the related art at least to some extent.

According to an embodiment of the present disclosure, there is provided a method for noise reduction of edge-side real-time speech, the method including:

Acquiring an input voice signal, and preprocessing the input voice signal to obtain a plurality of frames of voice signals;

Performing fast Fourier transform on the voice signal aiming at one frame of the voice signal to obtain an original frequency domain signal;

calculating a mel frequency cepstrum coefficient of the original frequency domain signal according to the original frequency domain signal so as to extract MFCC characteristic parameters of the input voice signal;

Inputting the MFCC characteristic parameters into a separable convolutional neural network subjected to INT8 fixed-point quantization processing to obtain a noise estimation result, wherein the separable convolutional neural network comprises an input layer, a first depth separable convolutional layer, a second depth separable convolutional layer, a full connection layer and an output layer which are sequentially connected in series, and the noise estimation result comprises a voice frame and a non-voice frame;

performing adaptive iterative wiener filtering on the voice frame and the non-voice frame by using a wiener filter to obtain a filtered frequency domain signal;

performing inverse fast fourier transform on the filtered frequency domain signal to obtain a filtered time domain signal;

Traversing all the voice signals to obtain the filtered time domain signals corresponding to all the voice signals, and combining all the filtered time domain signals to obtain noise-reducing voice signals.

Further, the step of obtaining an input voice signal and preprocessing the input voice signal includes:

Pre-emphasis the input speech signal with a pre-emphasis filter;

Dividing the pre-emphasized input voice signal into a plurality of frames of voice signals;

The speech signal is windowed for each frame using a hamming window to reduce spectral leakage.

Further, the expression of the pre-emphasis filter is:

Wherein, Is the original input signal at the firstThe values of the individual sample points are used,Is the original input signal at the firstThe values of the individual sample points are used,Is the original input signal after pre-emphasis is at the firstThe values of the individual sample points are used,Is a pre-emphasis coefficient;

the Hamming window expression is:

Wherein, Is a Hamming window function at the firstThe values of the individual sample points are used,The number of samples per frame is calculated,Index of current sampling point, range is;

Each frame of the voice signalThe windowed expression is:

Wherein, Is the voice signal at the firstValues of the sampling points; In the first place for windowed speech signal The values of the sampling points.

Further, the step of calculating mel-frequency cepstrum coefficients of the original frequency-domain signal to extract MFCC characteristic parameters of the input speech signal includes:

Using a mel filter bank to convert the frequency of the original frequency domain signal Conversion to mel frequencyWherein the mel filter bank comprises a plurality of mel filters;

According to Mel frequency And the original frequency domain signal, calculating the energy of each Mel filter;

and carrying out discrete cosine transform on the logarithm of the energy of all the Mel filters to obtain MFCC characteristic parameters.

Further, the mel frequency is expressed as:

the energy of the mel filter is calculated as follows:

Wherein, As the frequency point of the signal, the signal is the frequency point,As the original frequency domain signal,In order to the original frequency domain signalThe power spectral density at the frequency bin,Is the firstFirst of the Mel filterThe frequency response of the frequency bin,For the start-up frequency of the mel-filter,Is the ending frequency of the mel filter;

The discrete cosine transform expression is:

Wherein, Is the firstThe number of MFCC characteristic parameters,For the number of mel-filters,Is the firstEnergy of the individual mel filters.

Further, the step of inputting the MFCC characteristic parameter into the separable convolutional neural network after the INT8 fixed-point quantization process to obtain the noise estimation result includes:

Sequentially inputting the MFCC characteristic parameters into the input layer, the first depth separable convolution layer and the second depth separable convolution layer, flattening the output of the second depth separable convolution layer and inputting the flattened output into the fully connected layer to generate the noise estimation result;

The output layer outputs the noise estimation result, wherein if the VAD signal of the noise estimation result is higher than a preset threshold value, the voice signal of the current frame is the non-voice frame, and if the VAD signal of the noise estimation result is smaller than or equal to the threshold value, the voice signal of the current frame is the voice frame.

Further, the first depth-separable convolution layer includes a first depth convolution and a first point-wise convolution, and the second depth-separable convolution layer includes a second depth convolution and a second point-wise convolution.

Further, the step of performing adaptive iterative wiener filtering on the speech frame and the non-speech frame by using a wiener filter to obtain a filtered frequency domain signal includes:

The average power spectrum of the mute section is adopted as an initial noise power spectrum estimation, and a preset constant is adopted as a voice power spectrum;

If the speech signal of the current frame is the non-speech frame, updating the noise power spectrum estimation:

Wherein, Is the current frameA single frequency bin noise power spectrum estimate,To the updated current frameA single frequency bin noise power spectrum estimate,Is the current frameThe power spectral density of the individual frequency bins,Is a smoothing coefficient;

if the voice signal of the current frame is the voice frame, maintaining the noise power spectrum estimation unchanged;

calculating the gain of the wiener filter according to the updated noise power spectrum estimation and the voice power spectrum :

Performing adaptive iterative wiener filtering on the original frequency domain signal of the current frame by using the wiener filter to obtain a filtered frequency domain signal, wherein the adaptive iterative wiener filtering has the following expression:

Wherein, Is a filtered frequency domain signal that,Is the original frequency domain signal.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:

In the embodiment of the disclosure, through the noise reduction method of the edge-side real-time voice, on one hand, pre-emphasis, framing and windowing are performed on an input voice signal, so as to improve the high-frequency component of the signal and reduce the distortion of the signal. The preprocessed input speech signal is subjected to a fast fourier transform to convert the time domain signal into an original frequency domain signal. And calculating the Mel frequency cepstrum coefficient of the original frequency domain signal to extract the MFCC characteristic parameters of the voice. The noise estimation is carried out on the MFCC characteristic parameters by utilizing the light separable convolutional neural network, so that the parameter quantity and the calculated quantity are reduced, and the efficient noise estimation is realized. And performing adaptive iterative wiener filtering by using the generated noise estimation, and gradually eliminating noise in the voice signal to obtain a filtered frequency domain signal. And performing inverse fast Fourier transform on the filtered frequency domain signal to recover the time domain signal. And outputting a time domain signal, namely the noise-reduced voice signal. On the other hand, the method can perform offline real-time noise reduction under the edge end scene with limited power consumption and small computing resources, and has better noise reduction effect.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

FIG. 1 is a step diagram of a method for edge-side real-time speech noise reduction in an exemplary embodiment of the present disclosure;

FIG. 2 illustrates synthesized audio minus 20db for noisy speech and its denoised audio schematic in an exemplary embodiment of the present disclosure;

fig. 3 shows an audio schematic of an environmental noise call recording and its denoised audio in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein, but rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of embodiments of the disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities.

The embodiment provides a noise reduction method for edge-side real-time voice. Referring to fig. 1, the method for noise reduction of the edge-side real-time voice may include steps S101 to S107.

S101, acquiring an input voice signal, and preprocessing the input voice signal to obtain a plurality of frames of voice signals;

Step S102, performing fast Fourier transform on the voice signal aiming at one frame of the voice signal to obtain an original frequency domain signal;

Step S103, calculating the Mel frequency cepstrum coefficient of the original frequency domain signal to extract the MFCC characteristic parameters of the input voice signal;

Step S104, inputting the MFCC characteristic parameters into a separable convolutional neural network subjected to INT8 fixed-point quantization processing to obtain a noise estimation result, wherein the separable convolutional neural network comprises an input layer, a first depth separable convolutional layer, a second depth separable convolutional layer, a full connection layer and an output layer which are sequentially serial, and the noise estimation result comprises a voice frame and a non-voice frame;

Step 105, performing adaptive iterative wiener filtering on the voice frame and the non-voice frame by using a wiener filter to obtain a filtered frequency domain signal;

step S106, performing inverse fast Fourier transform on the filtered frequency domain signal to obtain a filtered time domain signal;

Step S107, traversing all the voice signals to obtain the filtered time domain signals corresponding to all the voice signals, and combining all the filtered time domain signals to obtain the noise-reduced voice signals.

According to the noise reduction method for the edge-side real-time voice, on one hand, pre-emphasis, framing and windowing are carried out on an input voice signal so as to improve the high-frequency component of the signal and reduce the distortion of the signal. The preprocessed input speech signal is subjected to a fast fourier transform to convert the time domain signal into an original frequency domain signal. And calculating the Mel frequency cepstrum coefficient of the original frequency domain signal to extract the MFCC characteristic parameters of the voice. The noise estimation is carried out on the MFCC characteristic parameters by utilizing the light separable convolutional neural network, so that the parameter quantity and the calculated quantity are reduced, and the efficient noise estimation is realized. And performing adaptive iterative wiener filtering by using the generated noise estimation, and gradually eliminating noise in the voice signal to obtain a filtered frequency domain signal. And performing inverse fast Fourier transform on the filtered frequency domain signal to recover the time domain signal. And outputting a time domain signal, namely the noise-reduced voice signal. On the other hand, the method can perform offline real-time noise reduction under the edge end scene with limited power consumption and small computing resources, and has better noise reduction effect.

Next, each step of the above-described edge-side real-time voice noise reduction method in the present exemplary embodiment will be described in more detail with reference to fig. 1 to 3.

In step S101, an input speech signal is acquired and preprocessed to obtain a number of frames of speech signals.

The method comprises the steps of pre-emphasizing an input voice signal by using a pre-emphasizing filter, dividing the pre-emphasized input voice signal into a plurality of frames of voice signals, and windowing each frame of voice signal by using a Hamming window to reduce frequency spectrum leakage.

In one embodiment, the input speech signal is pre-emphasis processed to enhance the high frequency content. Is embodied as an input signalApplying a pre-emphasis filter, the formula:

Wherein, Is the original input signal at the firstThe values of the individual sample points are used,Is the original input signal at the firstThe values of the individual sample points are used,Is the original input signal after pre-emphasis is at the firstThe values of the individual sample points are used,For the pre-emphasis coefficient, for controlling the degree of enhancement of the high frequency component, for example,The value is 0.95.

The pre-emphasized signal is divided into frames, each frame being 10 ms in length and the frame being shifted by 5 ms. A hamming window is applied to each frame of signal to reduce spectral leakage. The windowing formula is as follows:

Wherein, Is a Hamming window function at the firstValues of the sampling points; the number of sampling points for each frame, namely the frame length; Index of current sampling point, range is 。

When applying a hamming window, each frame of speech signalThe windowing process is as follows:

In step S102, for one frame of the speech signal, a fast fourier transform is performed on the speech signal to obtain an original frequency domain signal.

Specifically, the windowed signal of each frame is subjected to a fast fourier transform (Fast Fourier transform, FFT), and the time domain signal is converted into a frequency domain signal, so as to obtain a complex frequency spectrum.

In step S103, the mel-frequency cepstrum coefficient thereof is calculated from the original frequency-domain signal to extract the MFCC characteristic parameters of the input speech signal.

Specifically, the Mel filter bank is used to convert the frequency of the original frequency domain signalConversion to mel frequencyWherein the Mel filter group comprises several Mel filters according to Mel frequencyAnd calculating the energy of each Mel filter according to the original frequency domain signal, taking logarithms of the energy of all Mel filters, and performing discrete cosine transform to obtain MFCC characteristic parameters.

In one embodiment, mel frequency cepstral coefficients (Mel-Frequency Cepstral Coefficients, MFCC) of the frequency domain signal are calculated. First, the spectrum is passed through a mel filter bank, and the energy of each filter is calculated. The filter energy is then logarithmically measured and Discrete Cosine Transformed (DCT) to yield MFCC features. The specific process is as follows:

first a mel filter bank (e.g. 40 filters are used in this embodiment) is applied to the filter bank Conversion to mel frequencyThe mel frequency formula is as follows:

for each filter, calculate its output energy :

The expression of discrete cosine transform is:

Wherein, Is the firstThe number of MFCC characteristic parameters (e.g., i=10 in this embodiment),For the number of mel filters (e.g., taking m=40 in this embodiment),Is the firstEnergy of the individual mel filters.

In step S104, the MFCC characteristic parameters are input into a separable convolutional neural network subjected to INT8 fixed-point quantization processing to obtain a noise estimation result, wherein the separable convolutional neural network comprises an input layer, a first depth separable convolutional layer, a second depth separable convolutional layer, a full connection layer and an output layer which are sequentially connected in series, and the noise estimation result comprises a voice frame and a non-voice frame.

More specifically, inputting the MFCC characteristic parameters into the input layer, the first depth separable convolution layer and the second depth separable convolution layer in sequence, flattening the output of the second depth separable convolution layer, and inputting the flattened output into the full connection layer to generate a noise estimation result;

The output layer outputs a noise estimation result, wherein if the VAD signal of the noise estimation result is higher than a preset threshold value, the voice signal of the current frame is a non-voice frame, and if the VAD signal of the noise estimation result is smaller than or equal to the threshold value, the voice signal of the current frame is a voice frame.

In addition, the first depth-separable convolution layer includes a first depth convolution and a first point-wise convolution, and the second depth-separable convolution layer includes a second depth convolution and a second point-wise convolution.

In one embodiment, a lightweight separable convolutional neural network (i.e., separable convolutional neural network) is constructed, and MFCC characteristics (i.e., MFCC characteristic parameters) are input into the designed lightweight separable convolutional neural network. The network consists of a plurality of depth separable convolution layers, and fixed-point quantization processing is carried out, so that a noise estimation result can be obtained quickly and efficiently. The method comprises the following specific steps:

(1) Input features from the previous step, mel-frequency cepstral coefficients (MFCCs) of each frame of signal are calculated, and these MFCC features are to be used as inputs to the neural network.

(2) Light separable convolutional neural network design

1) Network structure:

input layer input MFCC two-dimensional feature vector (number of samples per frame x feature number).

Depth separable convolution layer (2 layers) depth separable convolution is used to reduce the number of parameters and computation. Each convolution layer is divided into a depth convolution (convolution kernel 10×2) and a point-by-point convolution (convolution kernel 1×1), with a channel number of 32.

Activation function-ReLU activation function is applied after each separable convolutional layer.

And the full connection layer is used for flattening the output of the convolution layer and inputting the flattened output of the convolution layer into the full connection layer to generate a noise estimation result.

2) Network training

Data preparation, marking noisy and clean speech as 1, marking pure noise data as 0, training with 15 hours of speech data.

Loss function-using a mean square error loss function.

And (3) perceptual quantization training, namely, in order to obtain a lightweight model, using the perceptual quantization training to the network structure to obtain a lightweight network after fixed-point quantization so as to reduce the network calculation amount.

In a specific embodiment, the output of each layer of the neural network:

input layer 160×10

The first depth may separate the convolution outputs 151 x 32.

The second depth may separate the convolution outputs 151 x 32.

151×32 After ReLU activation.

Full connection layer, and is flattened to 4832.

In step S105, adaptive iterative wiener filtering is performed on the speech frame and the non-speech frame, respectively, using wiener filters to obtain filtered frequency domain signals.

Specifically, a key role of noise estimation using a neural network (i.e., a separable convolutional neural network) is to determine whether a speech signal is present in the current frame. In the adaptive iterative wiener filtering step, it is determined whether to update the parameters of the wiener filter based on the result of the voice activity detection.

(1) Voice Activity Detection (VAD)

In the noise estimation step, a lightweight separable convolutional neural network is used to indicate whether the current frame contains speech. A threshold is set (in this embodiment, the threshold is set to 0.8), and if the VAD signal output by the network is higher than the threshold, the current frame is determined to be a non-speech frame.

(2) Wiener filter initialization

At the beginning of the process, the average power spectrum of the silence segment is used as the initial noise power spectrum estimate, and a small constant value (0.0001) is used as the initial speech power spectrum.

(3) Iterative updating

Non-speech frame processing, noise power spectrum update

If the current frame is determined to be a non-speech frame, the noise power spectrum estimate is updated:

Wherein, Is the current frameA single frequency bin noise power spectrum estimate,Is the updated current frameA single frequency bin noise power spectrum estimate,Is the current frameThe power spectral density of the individual frequency bins,Is a smoothing coefficient, and the value of the embodiment0.998.

Speech frame processing, maintaining the noise power spectrum unchanged

If the current frame is determined to be a speech frame, the noise power spectrum estimate is not updated.

(4) Wiener filter computation

Calculating wiener filter gain based on updated noise and speech power spectra:

And then filtering the frequency domain signal of the current frame by applying a wiener filter:

Wherein, Is a filtered frequency domain signal that is then filtered,Is the original frequency domain signal.

In step S106 and step S107, the filtered frequency domain signal is subjected to inverse fast Fourier transform to obtain a filtered time domain signal, all the voice signals are traversed to obtain the filtered time domain signals corresponding to all the voice signals, and all the filtered time domain signals are combined to obtain the noise-reduced voice signal

Specifically, the filtered frequency domain signal is subjected to inverse fast fourier transform (INVERSE FAST Fourier transform, IFFT) to recover the time domain signal. And combining the time domain signals of all frames to obtain a final noise reduction voice signal.

In a specific embodiment, the noise reduction method (i.e., RTC method) of the edge-side real-time voice proposed by the application is compared with the traditional noise reduction method (including RNNoise, deepFilterNet and S-DCCRN in particular). The method for reducing the noise of the edge-side real-time voice provided by the application has the advantages that the power consumption is only 0.8W, the time for processing the audio of one frame for 10ms is only 4ms, and the noise suppression under the strong noise environment can reach more than 25 db. Compared with the traditional method and the pure deep learning scheme, the method has the advantages of less parameters and calculation amount and better noise reduction effect.

Table 1 comparison of noise reduction methods

As shown in fig. 2, the synthesized audio with noise voice minus 20db and the audio after noise reduction are shown.

As shown in fig. 3, an audio schematic diagram of the voice recording of the ambient noise and the noise reduction is shown.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present disclosure, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the embodiments of the present disclosure, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured" and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed, mechanically connected, electrically connected, directly connected, indirectly connected via an intervening medium, or in communication between two elements or in an interaction relationship between two elements. The specific meaning of the terms in this disclosure will be understood by those of ordinary skill in the art as the case may be.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, one skilled in the art can combine and combine the different embodiments or examples described in this specification.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. The method for reducing noise of the edge-side real-time voice is characterized by comprising the following steps:

2. The method for noise reduction of edge-side real-time speech according to claim 1, wherein the step of obtaining an input speech signal and preprocessing the input speech signal comprises:

Pre-emphasis the input speech signal with a pre-emphasis filter;

3. The method for noise reduction of edge-side real-time speech according to claim 2, wherein the pre-emphasis filter has the expression:

the Hamming window expression is:

Each frame of the voice signalThe windowed expression is:

4. The method of claim 3, wherein the step of calculating mel-frequency cepstrum coefficients of the original frequency-domain signal to extract MFCC characteristic parameters of the input speech signal comprises:

5. The method for noise reduction of edge-side real-time speech according to claim 4, wherein the mel frequency is expressed as:

the energy of the mel filter is calculated as follows:

The discrete cosine transform expression is:

6. The method for noise reduction of edge-side real-time speech according to claim 5, wherein the step of inputting the MFCC characteristic parameters into a separable convolutional neural network subjected to INT8 fixed-point quantization processing to obtain a noise estimation result comprises:

7. The method of edge-side real-time speech noise reduction according to claim 6, wherein the first depth separable convolution layer comprises a first depth convolution and a first point-by-point convolution, and the second depth separable convolution layer comprises a second depth convolution and a second point-by-point convolution.

8. The method for noise reduction of edge-side real-time speech according to claim 6, wherein the step of performing adaptive iterative wiener filtering on the speech frame and the non-speech frame, respectively, by using wiener filters to obtain filtered frequency domain signals comprises: