CN106960672B

CN106960672B - Bandwidth extension method and device for stereo audio

Info

Publication number: CN106960672B
Application number: CN201710203054.1A
Authority: CN
Inventors: 高昕; 颜永红; 邹潇湘; 白海钏; 舒敏; 云晓春; 王锟; 张震; 计哲; 董琳; 金暐; 王中华; 李海灵; 李佳
Original assignee: Institute of Acoustics CAS; National Computer Network and Information Security Management Center
Current assignee: Institute of Acoustics CAS; National Computer Network and Information Security Management Center
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2020-08-21
Anticipated expiration: 2037-03-30
Also published as: CN106960672A

Abstract

The invention discloses a bandwidth extension method and device of stereo audio. The method comprises the following steps: decomposing the stereo signal into direct sound and diffuse sound; performing bandwidth extension on the diffused sound according to a preset frequency band extension method; separating direct sound into a plurality of point sound sources in different directions, and respectively performing bandwidth expansion on the plurality of point sound sources to obtain a plurality of point sound sources after bandwidth expansion; remixing the plurality of point sound sources after bandwidth expansion according to the pre-estimated azimuth information to obtain direct sound after bandwidth expansion; and reconstructing a broadband stereo audio signal according to the direct sound after bandwidth expansion and the diffuse sound after bandwidth expansion. By means of the technical scheme, the problem that in the prior art, signal bandwidth expansion is realized only according to subjective quality of a single channel reconstruction signal, correlation between signal energy and phases in two channels is not considered, and the reconstruction of stereo signals seriously influences judgment of a listener on the position and distance of a sound source is solved.

Description

Bandwidth expansion method and device for stereo audio

技术领域technical field

本发明涉及网络技术应用领域，特别涉及一种立体声音频的带宽扩展方法与装置。The invention relates to the application field of network technology, in particular to a method and device for bandwidth expansion of stereo audio.

背景技术Background technique

在数字音频信号处理技术中，通常将覆盖人耳可感知的20Hz～20KHz全部频率范围内的音频信号称作全带音频，这类信号主要应用于音乐信号的高保真重现。现阶段的音频即时通信系统无法提供足够的网络传输速率和终端处理能力，不可避免地会限制重建信号的有效带宽，优先量化编码音频信号的低频成分，进而提升音频通信系统的编码效率。In digital audio signal processing technology, audio signals covering the entire frequency range of 20Hz to 20KHz perceivable by the human ear are usually called full-band audio. Such signals are mainly used for high-fidelity reproduction of music signals. The current audio instant messaging system cannot provide sufficient network transmission rate and terminal processing capacity, which will inevitably limit the effective bandwidth of the reconstructed signal, and prioritize the low-frequency components of the encoded audio signal, thereby improving the encoding efficiency of the audio communication system.

传统电话语音通信系统通常传输的是窄带信号，其频率分布在300～3400Hz范围内，采样率为8kHz。相关主观听力测试结果表明，窄带语音中保留了91％的音节可懂度以及99％的语句可理解性。但是相比于真实语音，在实际通话中所传输窄带信号的自然度和主观质量均有明显下降。由于高频成分的缺失，窄带语音无法良好地区分部分的清音或爆破音，并削弱了其描述说话人特性的能力。为了有效地克服窄带音频的不足，宽带音频被广泛应用到了电话语音通信领域中，其有效带宽扩展到50Hz～7kHz，较好地覆盖了表征语音信号重要特性的大部分频谱，实现了接近调幅广播的音质水平。然而受到历史、经济、技术等诸多问题的限制，传统固定和移动通信完全实现从窄带向宽带音频的迈进还需要相当长的一段过渡期。The traditional telephone voice communication system usually transmits narrow-band signals, and its frequency is distributed in the range of 300 to 3400 Hz, and the sampling rate is 8 kHz. Correlative subjective listening test results show that 91% of syllable intelligibility and 99% of sentence intelligibility are preserved in narrowband speech. However, compared with real speech, the naturalness and subjective quality of narrowband signals transmitted in actual calls are significantly reduced. Due to the absence of high-frequency components, narrow-band speech cannot well distinguish partial unvoiced or plosive sounds, and impairs its ability to describe speaker characteristics. In order to effectively overcome the shortcomings of narrowband audio, wideband audio is widely used in the field of telephone voice communication. Its effective bandwidth is extended to 50Hz ~ 7kHz, which better covers most of the spectrum that characterizes the important characteristics of voice signals, and achieves close to AM broadcasting. sound quality level. However, limited by many problems such as history, economy, technology, etc., it will take a long transition period for traditional fixed and mobile communications to fully realize the transition from narrowband to wideband audio.

作为一种有效的音频增强方法，频带扩展方法可以在不改变窄带信号信源编码和网络传输的前提下，通过分析原始音频信号的时频特性，在接收端从重建的宽带音频中人为地恢复出编码端所截去的高频成分，进而达到增强重建音频听觉质量的目的。对于听力有损人士，频带扩展方法能够进一步改善其音素和语义的分辨能力。近十几年来，许多研究机构与科研人员针对单声道语音信号的频带扩展相继提出了众多解决方案。这些方法通常分别从频谱包络扩展和频谱细节扩展两个方面出发，进而合成信号高频成分，其原理如图1所示。首先根据人耳听觉感知原理对窄带信号进行时频特征提取；接下来，借助边信息或者先验知识所描述高低频特征之间的映射关系来对高频成分的频谱包络和能量进行估计；同时，选择适当的频谱修补方法来扩展频谱细节；最终，结合扩展后的频谱包络和频谱细节，实现宽带音频信号高频成分的有效重建。As an effective audio enhancement method, the band extension method can artificially recover from the reconstructed wideband audio at the receiving end by analyzing the time-frequency characteristics of the original audio signal without changing the source coding and network transmission of the narrowband signal. The high-frequency components cut off by the encoding end are extracted, so as to achieve the purpose of enhancing the auditory quality of the reconstructed audio. For the hearing-impaired, the band extension method can further improve phonemic and semantic discrimination. In the past ten years, many research institutions and researchers have successively proposed many solutions for the frequency band extension of monophonic speech signals. These methods usually start from two aspects of spectrum envelope expansion and spectrum detail expansion, and then synthesize the high-frequency components of the signal. The principle is shown in Figure 1. Firstly, according to the principle of human auditory perception, the time-frequency features of the narrowband signal are extracted; then, the spectral envelope and energy of the high-frequency components are estimated with the help of the mapping relationship between the high-frequency and low-frequency features described by the side information or prior knowledge; At the same time, an appropriate spectral patching method is selected to spread the spectral details; finally, the effective reconstruction of the high-frequency components of the wideband audio signal is achieved by combining the expanded spectral envelope and spectral details.

对于立体声音频，传统频带扩展方法多针对两个声道进行高频成分独立重建，这类方法仅根据单个声道重建信号的主观质量实现对信号带宽的扩展，没有考虑到两个声道中信号能量和相位的相关性，其重建立体声信号严重影响了听者对声源位置和距离的判定。For stereo audio, traditional frequency band expansion methods mostly reconstruct high-frequency components independently for two channels. These methods only expand the signal bandwidth according to the subjective quality of the reconstructed signal of a single channel, and do not consider the signals in the two channels. The correlation of energy and phase, which reconstructs the stereo signal seriously affects the listener's determination of the location and distance of the sound source.

发明内容SUMMARY OF THE INVENTION

鉴于上述问题，本发明提供了一种立体声音频的带宽扩展方法与装置。In view of the above problems, the present invention provides a method and device for bandwidth expansion of stereo audio.

本发明提供的立体声音频的带宽扩展方法，包括以下步骤：The bandwidth expansion method of stereo audio provided by the present invention comprises the following steps:

将立体声信号分解为直达声和扩散声；Decompose the stereo signal into direct sound and diffuse sound;

按照预设的频带扩展方法对所述扩散声进行带宽扩展；Bandwidth expansion is performed on the diffused sound according to a preset frequency band expansion method;

将所述直达声分离成多个不同方位的点声源，对多个点声源分别进行带宽扩展，得到带宽扩展后的多个点声源；The direct sound is separated into a plurality of point sound sources with different orientations, and bandwidth expansion is performed on the plurality of point sound sources respectively, so as to obtain a plurality of point sound sources after the bandwidth expansion;

将所述带宽扩展后的多个点声源按照预先估计的方位信息进行重新混合，得到带宽扩展后的直达声；remixing the multiple point sound sources after the bandwidth expansion according to the pre-estimated azimuth information to obtain the direct sound after the bandwidth expansion;

根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。A wideband stereo audio signal is reconstructed according to the bandwidth-expanded direct sound combined with the bandwidth-expanded diffused sound.

本发明还提供了一种立体声音频的带宽扩展装置，包括：分解模块、扩散声扩展模块、直达声分离与扩展模块、重构模块；The invention also provides a bandwidth expansion device for stereo audio, comprising: a decomposition module, a diffused sound expansion module, a direct sound separation and expansion module, and a reconstruction module;

所述分解模块，用于将立体声信号分解为直达声和扩散声；The decomposition module is used to decompose the stereo signal into direct sound and diffused sound;

所述扩散声扩展模块，用于按照预设的频带扩展方法对所述扩散声进行带宽扩展；The diffused sound expansion module is used to expand the bandwidth of the diffused sound according to a preset frequency band expansion method;

所述直达声分离与扩展模块，用于将所述直达声分离成多个不同方位的点声源，对多个点声源分别进行带宽扩展，得到带宽扩展后的多个点声源；The direct sound separation and expansion module is used to separate the direct sound into a plurality of point sound sources with different orientations, and respectively perform bandwidth expansion on the plurality of point sound sources to obtain a plurality of point sound sources after bandwidth expansion;

所述重构模块，用于将所述带宽扩展后的多个点声源按照预先估计的方位信息进行重新混合，得到带宽扩展后的直达声，根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。The reconstruction module is used to remix the multiple point sound sources after the bandwidth expansion according to the pre-estimated azimuth information to obtain the direct sound after the bandwidth expansion, and combine the bandwidth expansion according to the direct sound after the bandwidth expansion. The diffused sound after reconstruction reconstructs a wideband stereo audio signal.

本发明有益效果如下：The beneficial effects of the present invention are as follows:

本发明实施例首先利用声道间的频谱相关性将输入立体声信号分解为直达声和扩散声两种成分，然后扩散声成分直接利用传统频带扩展方法进行扩展；直达声则依据不同声源在时频结构上的稀疏性分离成多个不同方位的点声源，并分别进行带宽扩展，最终扩展后的点声源依照其在原始立体声中方位信息进行重新混合，并结合带宽扩展后的扩散声成分，重建出宽带立体声音频信号。本发明解决了现有技术中仅根据单个声道重建信号的主观质量实现对信号带宽的扩展，没有考虑到两个声道中信号能量和相位的相关性，其重建立体声信号严重影响了听者对声源位置和距离的判定的问题。In the embodiment of the present invention, the input stereo signal is firstly decomposed into two components: direct sound and diffused sound by using the spectral correlation between channels, and then the diffused sound component is directly expanded by using the traditional frequency band expansion method; The sparseness of the frequency structure is separated into multiple point sound sources with different orientations, and the bandwidths are expanded respectively. Finally, the expanded point sound sources are remixed according to their orientation information in the original stereo, and combined with the diffused sound after bandwidth expansion. components to reconstruct a wideband stereo audio signal. The invention solves the problem that in the prior art, the expansion of the signal bandwidth is only realized according to the subjective quality of the reconstructed signal of a single channel, and the correlation between the signal energy and the phase in the two channels is not considered, and the reconstructed stereo signal seriously affects the listener. The problem of determining the position and distance of the sound source.

附图说明Description of drawings

图1为现有技术中单声道语音信号频带扩展方法的基本流程图；Fig. 1 is the basic flow chart of the method for expanding the frequency band of a monophonic voice signal in the prior art;

图2是本发明方法实施例的立体声音频的带宽扩展方法的流程图；Fig. 2 is the flow chart of the bandwidth expansion method of stereo audio of the method embodiment of the present invention;

图3是本发明装置实施例的立体声音频的带宽扩展装置的结构示意图；3 is a schematic structural diagram of a bandwidth expansion device for stereo audio according to an apparatus embodiment of the present invention;

图4是本发明实例1的立体声音频的带宽扩展方法的原理框图；Fig. 4 is the principle block diagram of the bandwidth expansion method of the stereo audio of the example 1 of the present invention;

图5是本发明实例1中基于深度神经网络的状态空间模型的原理框图。FIG. 5 is a schematic block diagram of a state space model based on a deep neural network in Example 1 of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

为了解决现有技术中仅根据单个声道重建信号的主观质量实现对信号带宽的扩展，没有考虑到两个声道中信号能量和相位的相关性，其重建立体声信号严重影响了听者对声源位置和距离的判定的问题，本发明提供了一种立体声音频的带宽扩展方法与装置，以下结合附图以及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不限定本发明。In order to solve the problem of expanding the signal bandwidth only based on the subjective quality of the reconstructed signal of a single channel in the prior art, the correlation between the signal energy and phase in the two channels is not considered, and the reconstructed stereo signal seriously affects the listener's perception of the sound. Regarding the determination of source location and distance, the present invention provides a method and device for bandwidth expansion of stereo audio. The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to illustrate the present invention, but not to limit the present invention.

根据本发明的方法实施例，提供了一种立体声音频的带宽扩展方法，图1是本发明方法实施例的立体声音频的带宽扩展方法的流程图，如图1所示，根据本发明方法实施例的立体声音频的带宽扩展方法包括如下处理：According to the method embodiment of the present invention, a method for bandwidth expansion of stereo audio is provided. FIG. 1 is a flowchart of the method for bandwidth expansion of stereo audio according to the method embodiment of the present invention. As shown in FIG. 1 , according to the method embodiment of the present invention The bandwidth expansion method of stereo audio includes the following processing:

步骤201，将立体声信号分解为直达声和扩散声。Step 201, decompose the stereo signal into direct sound and diffuse sound.

具体的，步骤201包括以下步骤：Specifically, step 201 includes the following steps:

将所述立体声信号分解为左声道和右声道；decompose the stereo signal into left and right channels;

分别将分帧处理后的左声道和右声道进行时频变换，得到立体声信号的左声道短时频谱成分和右声道短时频谱成分；Perform time-frequency transformation on the left channel and right channel after frame division respectively, and obtain the left channel short-term spectral component and the right channel short-term spectral component of the stereo signal;

分别根据所述左声道短时频谱成分和右声道短时频谱成分，得到左右声道信号能量谱之间的和P_sum、左右声道信号能量谱之间的差P_diff、左右声道信号能量谱之间的互相关P_cc；According to the short-term spectral components of the left channel and the short-term spectral components of the right channel, respectively, the sum P _sum between the left and right channel signal energy spectra, the difference P _diff between the left and right channel signal energy spectra, the left and right channels the cross-correlation P _cc between the signal energy spectra;

利用所述P_sum、P_diff、及P_cc通过最小二乘法得到直达声矩阵；Utilize described P _sum , P _diff , and P _cc to obtain direct sound matrix by least squares method;

利用所述直达声矩阵从所述立体声信号中分离出直达声；Utilize the direct sound matrix to separate the direct sound from the stereo signal;

利用所述立体声信号减去所述直达声得到扩散声。The diffuse sound is obtained by subtracting the direct sound from the stereo signal.

更加具体的，所述分别根据所述左声道短时频谱成分和右声道短时频谱成分，得到左右声道信号能量谱之间的和P_sum、左右声道信号能量谱之间的差P_diff、左右声道信号能量谱之间的互相关P_cc、包括：More specifically, according to the short-term spectral components of the left channel and the short-term spectral components of the right channel, the sum P _sum between the left and right channel signal energy spectra and the difference between the left and right channel signal energy spectra are obtained. P _diff , the cross-correlation P _cc between the left and right channel signal energy spectra, including:

利用所述左声道短时频谱成分S_L(t,f)和所述右声道短时频谱成分S_R(t,f)根据公式P_cc＝R{S_L(t,f)S_R ^*(t,f)}计算左右声道信号能量谱之间的互相关P_cc，其中R{}为取实部操作。Using the left channel short-term spectral component _SL (t, f) and the right channel short-term spectral component _SR (t, f) according to the formula P _cc =R{ _SL (t,f)S _R ^* (t,f)} Calculate the cross-correlation P _cc between the left and right channel signal energy spectra, where R{} is the real part operation.

更加具体的，所述利用所述直达声矩阵从立体声信号中分离出直达声，包括：More specifically, the use of the direct sound matrix to separate the direct sound from the stereo signal includes:

利用所述直达声矩阵M_D(t,f)根据公式1从立体声信号S(t,f)中分离出直达声S'(t,f)；Use the direct sound matrix MD (t, f) to separate the direct sound S'(t, f) from the stereo signal _S (t, f) according to formula 1;

S′(t，f)＝M_D(t，f)[S_L(t，f)S_R(t，f)]^T公式1。S'(t, f) = M _D (t, f) [ _SL (t, f) S _R (t, f)] ^T formula 1.

步骤202，按照预设的频带扩展方法对所述扩散声进行带宽扩展。Step 202, performing bandwidth expansion on the diffused sound according to a preset frequency band expansion method.

具体的，步骤202直接利用传统的频带扩展方法对所述扩散声进行带宽扩展，本发明不作赘述。Specifically, step 202 directly utilizes the traditional frequency band extension method to perform bandwidth extension on the diffused sound, which is not described in detail in the present invention.

步骤203，将所述直达声分离成多个不同方位的点声源，对多个点声源分别进行带宽扩展，得到带宽扩展后的多个点声源。Step 203 , separating the direct sound into a plurality of point sound sources with different azimuths, and performing bandwidth expansion on the plurality of point sound sources respectively to obtain a plurality of point sound sources with expanded bandwidths.

具体的，步骤203中将所述直达声分离成多个不同方位的点声源，包括：Specifically, in step 203, the direct sound is separated into a plurality of point sound sources with different orientations, including:

计算每一个时频点上直达声的方向信息，对全部时频点的方向信息进行聚类，得到方向信息的聚类中心，所述聚类中心分别对应各个点声源的方向信息；Calculate the direction information of the direct sound at each time-frequency point, and cluster the direction information of all the time-frequency points to obtain a cluster center of the direction information, and the cluster centers respectively correspond to the direction information of each point sound source;

根据某一时频点上直达声的方向信息和所述方向信息的聚类中心，得到掩蔽矩阵；Obtain a masking matrix according to the direction information of the direct sound at a certain time-frequency point and the cluster center of the direction information;

利用所述掩蔽矩阵对直达声进行分离，得到多个不同方位的点声源。The direct sound is separated by using the masking matrix to obtain a plurality of point sound sources in different directions.

具体的，所述对多个点声源分别进行带宽扩展，包括：Specifically, the bandwidth expansion of the multiple point sound sources includes:

将多个点声源分别输入到预设的状态空间模型中拟合窄带信号的短时频谱和宽带信号的短时频谱之间的映射关系，并根据预设的误差准则对宽带信号短时频谱高频成分的频谱包络进行估计，结合低频频谱包络和采用适当频谱修补方法扩展后的频谱细节，得到带宽扩展后的多个点声源。Input multiple point sound sources into the preset state space model to fit the mapping relationship between the short-term spectrum of the narrowband signal and the short-term spectrum of the wideband signal, and analyze the short-term spectrum of the wideband signal according to the preset error criterion. The spectral envelope of the high-frequency components is estimated, and combined with the low-frequency spectral envelope and the spectral details expanded by appropriate spectral patching methods, multiple point sound sources with expanded bandwidth are obtained.

更加具体的，所述在所述状态空间模型中拟合窄带信号的短时频谱和宽带信号的短时频谱之间的映射关系，并根据预设的误差准则对高频成分的频谱包络进行估计，包括：More specifically, in the state space model, the mapping relationship between the short-term spectrum of the narrowband signal and the short-term spectrum of the broadband signal is fitted, and the spectral envelope of the high-frequency component is performed according to a preset error criterion. Estimates, including:

利用前一时刻隐藏状态矢量和前一时刻窄带信号的短时频谱，得到状态空间模型中隐藏状态矢量；Using the hidden state vector at the previous moment and the short-term spectrum of the narrowband signal at the previous moment, the hidden state vector in the state space model is obtained;

利用所述状态空间模型中隐藏状态矢量和当前时刻窄带信号的短时频谱，得到宽带信号的短时频谱。Using the hidden state vector in the state space model and the short-term spectrum of the narrowband signal at the current moment, the short-term spectrum of the wideband signal is obtained.

步骤204，将所述多个带宽扩展后的点声源按照预设的方位信息进行重新混合，得到带宽扩展后的直达声，根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。Step 204, remixing the multiple bandwidth-expanded point sound sources according to the preset azimuth information to obtain the direct sound after the bandwidth expansion, and reconstructing the diffused sound after combining with the bandwidth-expanded direct sound according to the bandwidth expansion. output wideband stereo audio signal.

具体的，所述预先估计的方位信息根据所述方向信息的聚类中心估计得到，所述估计的方法为本领域的常规技术手段，本发明对此不作赘述。Specifically, the pre-estimated orientation information is estimated and obtained according to the cluster center of the orientation information, and the estimation method is a conventional technical means in the field, which is not described in detail in the present invention.

具体的，利用公式2根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号；Specifically, use formula 2 to reconstruct a wideband stereo audio signal according to the bandwidth-expanded direct sound combined with the bandwidth-expanded diffused sound;

在公式2中，

表示宽带扩展后立体声信号的短时频谱；

表示宽带扩展后直达声的短时频谱；

表示带宽扩展后扩散声的短时频谱。In Equation 2,

Represents the short-term spectrum of the wideband extended stereo signal;

Represents the short-term spectrum of the direct sound after broadband expansion;

Represents the short-term spectrum of the diffuse sound after bandwidth expansion.

与本发明的方法实施例相对应，提供了一种立体声音频的带宽扩展装置，图3是本发明装置实施例的立体声音频的带宽扩展装置的结构示意图，如图3所示，根据本发明装置实施例的立体声音频的带宽扩展装置包括：分解模块30、扩散声扩展模块32、直达声分离与扩展模块34、重构模块36，以下对本发明实施例的各个模块进行详细的说明。Corresponding to the method embodiment of the present invention, a device for bandwidth expansion of stereo audio is provided. FIG. 3 is a schematic structural diagram of the device for bandwidth expansion of stereo audio according to an embodiment of the device of the present invention. As shown in FIG. The stereo audio bandwidth expansion device of the embodiment includes: a decomposition module 30 , a diffused sound expansion module 32 , a direct sound separation and expansion module 34 , and a reconstruction module 36 .

具体地，所述分解模块30，用于将立体声信号分解为直达声和扩散声；Specifically, the decomposition module 30 is used to decompose the stereo signal into direct sound and diffused sound;

所述扩散声扩展模块32，用于按照预设的频带扩展方法对所述扩散声进行带宽扩展；The diffused sound expansion module 32 is configured to perform bandwidth expansion on the diffused sound according to a preset frequency band expansion method;

所述直达声分离与扩展模块34，用于将所述直达声分离成多个不同方位的点声源，对多个点声源分别进行带宽扩展，得到带宽扩展后的多个点声源；The direct sound separation and expansion module 34 is used to separate the direct sound into a plurality of point sound sources with different orientations, and respectively perform bandwidth expansion on the plurality of point sound sources to obtain a plurality of point sound sources after the bandwidth expansion;

所述重构模块36，用于将所述带宽扩展后的多个点声源按照预先估计的方位信息进行重新混合，得到带宽扩展后的直达声，用于根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。The reconstruction module 36 is used to remix the multiple point sound sources after the bandwidth expansion according to the pre-estimated azimuth information to obtain the direct sound after the bandwidth expansion, which is used for the direct sound after the bandwidth expansion. A wideband stereo audio signal is reconstructed by combining the diffused sound after bandwidth expansion.

所述分解模块30具体用于：The decomposition module 30 is specifically used for:

所述直达声分离与扩展模块34具体用于：The direct sound separation and expansion module 34 is specifically used for:

将多个点声源分别输入到预设的状态空间模型中拟合窄带信号的短时频谱和宽带信号的短时频谱之间的映射关系，并根据预设的误差准则对宽带信号短时频谱高频成分的频谱包络进行估计，结合低频频谱包络和采用适当频谱修补方法扩展后的频谱细节，最终得到带宽扩展后的直达声。Input multiple point sound sources into the preset state space model to fit the mapping relationship between the short-term spectrum of the narrowband signal and the short-term spectrum of the wideband signal, and analyze the short-term spectrum of the wideband signal according to the preset error criterion. The spectral envelope of high-frequency components is estimated, combined with the low-frequency spectral envelope and the spectral details expanded by appropriate spectral patching methods, and finally the direct sound after bandwidth expansion is obtained.

为了更加详细的说明本发明的技术方案，给出实例1，图4是本发明实例1的立体声音频的带宽扩展方法的原理框图，如图4所示，一种立体声音频的带宽扩展方法包括以下步骤：In order to describe the technical solution of the present invention in more detail, Example 1 is given. FIG. 4 is a schematic block diagram of the bandwidth expansion method of stereo audio according to Example 1 of the present invention. As shown in FIG. 4 , a bandwidth expansion method of stereo audio includes the following step:

1.直达声/扩散声分离1. Direct sound/diffuse sound separation

本文所提出的立体声扩展系统采用离散傅里叶变换或者正交镜像滤波器组将分帧后的左右声道音频信号各自转换到频域，并根据人耳听觉感知原理划分为多个均匀子带或临界频带。那么，输入立体声信号的短时频谱S(t,f)可以表示为S(t,f)＝[S_L(t,f)S_R(t,f)]^T The stereo expansion system proposed in this paper uses discrete Fourier transform or orthogonal mirror filter bank to convert the framed left and right channel audio signals to the frequency domain, and divides them into multiple uniform sub-bands according to the principle of human auditory perception or critical band. Then, the short-term spectrum S(t,f) of the input stereo signal can be expressed as S(t,f)=[ _SL (t,f)S _R (t,f)] ^T

其中，t和f分别表示信号的时间帧和子带序号；S_L(t,f)和S_R(t,f)则分别表示立体声信号的左右声道短时频谱成分。Among them, t and f represent the time frame and subband number of the signal, respectively; _SL (t, f) and _SR (t, f) represent the short-term spectral components of the left and right channels of the stereo signal, respectively.

为了有效地分离直达声和扩散声，系统还需要分别计算左右声道信号能量谱之间的和P_sum和差P_diff以及两个声道的互相关P_cc。In order to effectively separate the direct sound and the diffuse sound, the system also needs to calculate the sum P _sum and the difference P _diff between the left and right channel signal energy spectra and the cross-correlation P _cc of the two channels, respectively.

P_cc＝R{S_L(t,f)S_R ^*(t,f)}P _cc =R{S _L (t,f)S _R ^* (t,f)}

其中，R{}为取实部操作。为了改善分离算法的稳定性，分别对计算得到的P_sum、P_diff和P_cc进行时间平滑。Among them, R{} is the operation of taking the real part. In order to improve the stability of the separation algorithm, time smoothing is performed on the calculated P _sum , P _diff and P _cc respectively.

立体声左右声道中的直达声成分之间高度相关，并可表示为由某一方向传播来的点声源信号。据此，本文所提系统利用一个直达声矩阵从原始立体声双声道信号S(t,f)中直接分离出直达声成分S'(t,f)，如下式所示，The direct sound components in the stereo left and right channels are highly correlated and can be represented as point sound source signals propagating from a certain direction. Accordingly, the system proposed in this paper uses a direct sound matrix to directly separate the direct sound component S'(t, f) from the original stereo binaural signal S(t, f), as shown in the following formula:

S'(t,f)＝[S_L'(t,f)S_R'(t,f)]^T＝M_D(t,f)[S_L(t,f)S_R(t,f)]^T＝M_D(t,f)S(t,f)S'(t,f)=[S _L '(t,f)S _R '(t,f)] ^T =MD (t,f)[ _S _L (t,f)S _R (t,f) ] ^T = M _D (t,f)S(t,f)

其中，S_L'(t,f)和S_R'(t,f)分别表示直达声的左右声道短时频谱成分，M_D(t，f)为直达声矩阵。根据文献【M Vinton,D McGrath,C Robinson,P Brown,next generationsurround decoding and upmixing for consumer ad professional applications.AES57^th International conference,USA,2015】所述，直达声矩阵M_D(t，f)可以利用最小二乘方法获得，从而使得估计得到的直达声成分和真实成分之间的期望平方误差最小，即Among them, _SL '(t, f) and _SR '(t, f) represent the short-term spectral components of the left and right channels of the direct sound, respectively, and _MD (t, f) is the direct sound matrix. According to the literature [M Vinton, D McGrath, C Robinson, P Brown, next generationsurround decoding and upmixing for consumer ad professional applications. ^{AES57 th} International conference, USA, 2015], the direct sound matrix M _D (t, f) can be used The least squares method is obtained, so that the expected square error between the estimated direct sound component and the real component is minimized, namely

则直达声矩阵M_D(t，f)可以由下式计算得到，Then the direct sound matrix M _D (t, f) can be calculated by the following formula,

而扩散声成分S”(t,f)则可以表示为原始立体声信号和直达声成分之差，The diffuse sound component S”(t,f) can be expressed as the difference between the original stereo signal and the direct sound component,

S”(t,f)＝S(t,f)-S'(t,f)S"(t,f)=S(t,f)-S'(t,f)

2.直达声成分的声源分离2. Sound source separation of direct sound components

根据S'(t,f)＝[S_L'(t,f)S_R'(t,f)]^T利用公式3得到某一时频点上直达声S'(t,f)的方向信息θ(t,f)，所述某一时频点上直达声S'(t,f)的方向信息θ(t,f)与点声源的方向信息θ_i相同；According to S'(t,f)=[S _L '(t,f)S _R '(t,f)] ^T , the direction information θ of the direct sound S'(t,f) at a certain time-frequency point is obtained by formula 3 (t, f), the direction information θ(t, f) of the direct sound S'(t, f) at a certain time-frequency point is the same as the direction information θ _i of the point sound source;

对全部时频点的方向信息θ(t,f)进行聚类，得到方向信息的聚类中心C_i,i＝1、2…N；这些聚类中心分别对应各个点声源S₁(t,f)、S₂(t,f)、S₃(t,f)…S_N(t,f)的方向信息θ₁、θ₂、θ₃…θ_N；The direction information θ(t, f) of all time-frequency points is clustered to obtain the cluster center C _i of the direction information, i=1, 2...N; these cluster centers correspond to each point sound source S ₁ (t ,f), S ₂ (t,f), S ₃ (t,f)…S _N (t,f) direction information θ ₁ , θ ₂ , θ ₃ …θ _N ;

根据某一时频点上直达声S'(t,f)的方向信息θ(t,f)和聚类中心C_i得到掩蔽矩阵m_i(t,f)；According to the direction information θ(t, f) of the direct sound S'(t, f) at a certain time-frequency point and the cluster center C _i , the masking matrix m _i (t, f) is obtained;

利用所述掩蔽矩阵m_i(t,f)根据公式4对直达声S'(t,f)进行分离，得到直达点声源

Use the masking matrix m _i (t, f) to separate the direct sound S'(t, f) according to formula 4, and obtain the direct point sound source

3.带宽扩展3. Bandwidth expansion

根据上文所述方法，分别从立体声信号中分离出扩散声成分S”(t,f)和直达声成分S'(t,f)，并利用时频稀疏性进一步将直达声成分S'(t,f)分离成多个点声源

接下来可以根据单声道频带扩展方法分别对扩散声S”(t,f)和直达点声源

进行独立的带宽扩展。According to the method described above, the diffuse sound component S"(t,f) and the direct sound component S'(t,f) are separated from the stereo signal respectively, and the direct sound component S'( t,f) separated into multiple point sources

Next, the diffuse sound S”(t,f) and the direct point sound source can be separately analyzed according to the monophonic frequency band extension method.

Perform independent bandwidth expansion.

本文采用状态空间模型来直接拟合窄宽带频谱参数之间的映射关系，并在实际扩展中根据一定的误差准则对高频成分的频谱包络进行估计，In this paper, the state space model is used to directly fit the mapping relationship between the narrow-band spectral parameters, and in the actual expansion, the spectral envelope of the high-frequency components is estimated according to a certain error criterion.

S_Y(t,f)＝F[S_X(t,f)]S _Y (t,f)＝F[S _X (t,f)]

式中，S_X(t,f)和S_Y(t,f)分别表示窄带和宽带信号的短时频谱，F[]表示映射(或估计)函数。In the formula, S _X (t, f) and S _Y (t, f) represent the short-term spectrum of the narrowband and wideband signals, respectively, and F[] represents the mapping (or estimation) function.

根据状态空间模型，映射函数F[]可以由状态演变函数F_state[]和观察函数F_obs[]两个过程来描述，如下式所示，According to the state space model, the mapping function F[] can be described by two processes, the state evolution function F _state [] and the observation function F _obs [], as shown in the following formula,

S_hidden(t,f)＝F_state[S_hidden(t-1,f),S_X(t-1,f),N₁(t,f)]S _hidden (t,f)=F _state [S _hidden (t-1,f),S _X (t-1,f),N ₁ (t,f)]

S_Y(t,f)＝F_obs[S_hidden(t,f),S_X(t,f),N₂(t,f)]S _Y (t, f) = F _obs [S _hidden (t, f), S _X (t, f), N ₂ (t, f)]

其中，S_hidden(t,f)为模型中隐藏状态矢量，N₁(t,f)和N₂(t,f)分别描述状态演变函数F_state和观测函数F_obs的误差。上述模型中，当前时刻的隐藏状态矢量S_hidden(t,f)由前一时刻隐藏状态矢量S_hidden(t-1,f)和前一时刻窄带信号的短时频谱S_X(t-1,f)所决定，而当前时刻宽带信号短时频谱S_Y(t,f)则进一步由当前时刻隐藏状态矢量S_hidden(t,f)和当前时刻窄带信号的短时频谱S_X(t,f)决定。利用状态空间模型中蕴含的隐藏状态递归结构能够更加精确地拟合窄宽带频谱参数之间的复杂映射关系，该模型可以采用广义卡尔曼滤波方法实现，也可以采用两个相互独立的深度神经网络来实现。基于深度神经网络的状态空间模型基本原理如图5所示。此处，状态演变函数F_state和观测函数F_obs可以采用堆栈自编码器、多层感知器、延时递归网络、长短时记忆网络等各种前向和递归深度神经网络实现。Among them, S _hidden (t, f) is the hidden state vector in the model, and N ₁ (t, f) and N ₂ (t, f) describe the error of the state evolution function F _state and the observation function F _obs , respectively. In the above model, the hidden state vector S _hidden (t, f) at the current moment is composed of the hidden state vector S _hidden (t-1, f) at the previous moment and the short-term spectrum S _X (t-1, f) of the narrowband signal at the previous moment. f), and the short-term spectrum S _Y (t, f) of the broadband signal at the current moment is further determined by the hidden state vector S _hidden (t, f) at the current moment and the short-term spectrum S _X (t, f) of the narrowband signal at the current moment )Decide. Using the hidden state recursive structure contained in the state space model can more accurately fit the complex mapping relationship between narrow-band spectrum parameters. The model can be implemented by the generalized Kalman filter method, or two independent deep neural networks can be used. to fulfill. The basic principle of state space model based on deep neural network is shown in Figure 5. Here, the state evolution function F _state and the observation function F _obs can be implemented by various forward and recursive deep neural networks such as stacked autoencoders, multilayer perceptrons, delayed recurrent networks, and long and short-term memory networks.

4.立体声信号合成4. Stereo signal synthesis

采用单声道频带扩展方法可以分别对扩散声S”(t,f)和直达点声源

2,…,N进行扩展，从而得到相应的宽带频谱S_Y(t,f)。接下来，可以利用各个点声源方向信息θ_i来重现宽带直达声

The monophonic frequency band extension method can be used for diffuse sound S”(t,f) and direct point sound source respectively.

2,...,N are expanded to obtain the corresponding broadband spectrum S _Y (t, f). Next, the broadband direct sound can be reproduced using the direction information θ _i of each point sound source

其中，

为扩展后的点声源宽带频谱。

为扩展后宽带直达声的短时频谱。最终，结合扩展后的宽带扩散声

可以实现宽带立体声信号

的重现，in,

is the expanded wideband spectrum of the point sound source.

is the short-term spectrum of the extended broadband direct sound. Finally, combined with the extended broadband diffuse sound

Wideband stereo signal can be realized

reappearance,

以上所述仅为本发明的实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的权利要求范围之内。The above description is only an embodiment of the present invention, and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the scope of the claims of the present invention.

Claims

1. A method of bandwidth extension for stereo audio, comprising:

decomposing the stereo signal into direct sound and diffuse sound;

performing bandwidth extension on the diffused sound according to a preset frequency band extension method;

separating the direct sound into a plurality of point sound sources in different directions, and respectively performing bandwidth expansion on the plurality of point sound sources to obtain a plurality of point sound sources with expanded bandwidths;

and remixing the plurality of point sound sources after bandwidth expansion according to the pre-estimated azimuth information to obtain direct sound after bandwidth expansion, and reconstructing a broadband stereo audio signal according to the direct sound after bandwidth expansion and the diffuse sound after bandwidth expansion.

2. The method of bandwidth extension of stereo audio according to claim 1, wherein said decomposing the stereo signal into direct sound and diffuse sound comprises:

decomposing the stereo signal into a left channel and a right channel;

respectively carrying out time-frequency transformation on the left channel and the right channel subjected to the framing processing to obtain a left channel short-time frequency spectrum component and a right channel short-time frequency spectrum component of the stereo signal;

respectively obtaining the sum P between the energy spectrums of the left and right sound channel signals according to the left and right sound channel short-time frequency spectrum components_sumThe difference P between the energy spectra of the left and right channel signals_diffCross-correlation P between the energy spectra of the left and right channel signals_cc；

Using said P_sum、P_diffAnd P_ccBy a minimum of twoMultiplying to obtain a direct sound matrix;

separating direct sound from the stereo signal using the direct sound matrix;

and eliminating the direct sound from the stereo signal to obtain diffuse sound.

3. The method of bandwidth extension for stereo audio according to claim 1,

the separating the direct sound into a plurality of point sound sources of different azimuths comprises:

calculating the direction information of direct sound on each time frequency point, clustering the direction information of all the time frequency points to obtain a clustering center of the direction information, wherein the clustering center respectively corresponds to the direction information of each point sound source;

obtaining a masking matrix according to the direction information of the direct sound at a certain time frequency point and the clustering center of the direction information;

and separating the direct sound by using the masking matrix to obtain a plurality of point sound sources in different directions.

4. The method of bandwidth extension for stereo audio according to claim 1,

the bandwidth extension is respectively carried out on the plurality of point sound sources, and the bandwidth extension comprises the following steps:

respectively inputting a plurality of point sound sources into a preset state space model to fit the mapping relation between the short-time frequency spectrum of the narrow-band signal and the short-time frequency spectrum of the broadband signal, estimating the spectrum envelope of the high-frequency component of the short-time frequency spectrum of the broadband signal according to a preset error criterion, and combining the low-frequency spectrum envelope and the spectrum details after being expanded by adopting a proper spectrum repairing method to obtain the plurality of point sound sources after bandwidth expansion.

5. The method of bandwidth extension for stereo audio according to claim 4,

the method comprises the following steps of fitting a mapping relation between a short-time frequency spectrum of a narrow-band signal and a short-time frequency spectrum of a wide-band signal in the state space model, and estimating a frequency spectrum envelope of a high-frequency component according to a preset error criterion, wherein the method comprises the following steps:

obtaining a hidden state vector in the state space model by using the hidden state vector at the previous moment and the short-time frequency spectrum of the narrowband signal at the previous moment;

and obtaining the short-time frequency spectrum of the broadband signal by using the hidden state vector in the state space model and the short-time frequency spectrum of the narrow-band signal at the current moment.

6. An apparatus for bandwidth extension of stereo audio, comprising: the system comprises a decomposition module, a diffuse sound expansion module, a direct sound separation and expansion module and a reconstruction module;

the decomposition module is used for decomposing the stereo signal into direct sound and diffuse sound;

the diffuse sound expansion module is used for performing bandwidth expansion on diffuse sound according to a preset frequency band expansion method;

the direct sound separation and expansion module is used for separating the direct sound into a plurality of point sound sources in different directions, and performing bandwidth expansion on the plurality of point sound sources respectively to obtain a plurality of point sound sources after bandwidth expansion;

and the reconstruction module is used for remixing the plurality of point sound sources after the bandwidth expansion according to the pre-estimated azimuth information to obtain direct sound after the bandwidth expansion, and reconstructing a broadband stereo audio signal according to the direct sound after the bandwidth expansion and the diffuse sound after the bandwidth expansion.

7. The apparatus of claim 6, wherein the decomposition module is specifically configured to:

decomposing the stereo signal into a left channel and a right channel;

obtaining the energy spectrum of the left and right sound channel signals according to the left and right sound channel short-time frequency spectrum components respectivelyMeta and P_sumThe difference P between the energy spectra of the left and right channel signals_diffCross-correlation P between the energy spectra of the left and right channel signals_cc；

Using said P_sum、P_diffAnd P_ccObtaining a direct sound matrix through a least square method;

separating direct sound from the stereo signal using the direct sound matrix;

8. The stereo audio bandwidth extension device of claim 6, wherein the direct sound separation and extension module is specifically configured to:

9. The stereo audio bandwidth extension device of claim 6, wherein the direct sound separation and extension module is specifically configured to:

10. The stereo audio bandwidth extension device of claim 9, wherein the direct sound separation and extension module is specifically configured to: