CN106960672A

CN106960672A - Method and device for bandwidth expansion of stereo audio

Info

Publication number: CN106960672A
Application number: CN201710203054.1A
Authority: CN
Inventors: 高昕; 颜永红; 邹潇湘; 白海钏; 舒敏; 云晓春; 王锟; 张震; 计哲; 董琳; 金暐; 王中华; 李海灵; 李佳
Original assignee: Institute of Acoustics CAS; National Computer Network and Information Security Management Center
Current assignee: Institute of Acoustics CAS; National Computer Network and Information Security Management Center
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2017-07-18
Anticipated expiration: 2037-03-30
Also published as: CN106960672B

Abstract

The invention discloses a bandwidth extension method and device of stereo audio. The method comprises the following steps: decomposing the stereo signal into direct sound and diffuse sound; performing bandwidth extension on the diffused sound according to a preset frequency band extension method; separating direct sound into a plurality of point sound sources in different directions, and respectively performing bandwidth expansion on the plurality of point sound sources to obtain a plurality of point sound sources after bandwidth expansion; remixing the plurality of point sound sources after bandwidth expansion according to the pre-estimated azimuth information to obtain direct sound after bandwidth expansion; and reconstructing a broadband stereo audio signal according to the direct sound after bandwidth expansion and the diffuse sound after bandwidth expansion. By means of the technical scheme, the problem that in the prior art, signal bandwidth expansion is realized only according to subjective quality of a single channel reconstruction signal, correlation between signal energy and phases in two channels is not considered, and the reconstruction of stereo signals seriously influences judgment of a listener on the position and distance of a sound source is solved.

Description

Method and device for bandwidth expansion of stereo audio

技术领域technical field

本发明涉及网络技术应用领域，特别涉及一种立体声音频的带宽扩展方法与装置。The invention relates to the application field of network technology, in particular to a method and device for extending the bandwidth of stereo audio.

背景技术Background technique

在数字音频信号处理技术中，通常将覆盖人耳可感知的20Hz～20KHz全部频率范围内的音频信号称作全带音频，这类信号主要应用于音乐信号的高保真重现。现阶段的音频即时通信系统无法提供足够的网络传输速率和终端处理能力，不可避免地会限制重建信号的有效带宽，优先量化编码音频信号的低频成分，进而提升音频通信系统的编码效率。In digital audio signal processing technology, an audio signal covering the entire frequency range of 20Hz to 20KHz that can be perceived by the human ear is usually called full-band audio. This type of signal is mainly used for high-fidelity reproduction of music signals. The current audio instant messaging system cannot provide sufficient network transmission rate and terminal processing capability, which will inevitably limit the effective bandwidth of the reconstructed signal, give priority to quantizing the low-frequency components of the encoded audio signal, and then improve the coding efficiency of the audio communication system.

传统电话语音通信系统通常传输的是窄带信号，其频率分布在300～3400Hz范围内，采样率为8kHz。相关主观听力测试结果表明，窄带语音中保留了91％的音节可懂度以及99％的语句可理解性。但是相比于真实语音，在实际通话中所传输窄带信号的自然度和主观质量均有明显下降。由于高频成分的缺失，窄带语音无法良好地区分部分的清音或爆破音，并削弱了其描述说话人特性的能力。为了有效地克服窄带音频的不足，宽带音频被广泛应用到了电话语音通信领域中，其有效带宽扩展到50Hz～7kHz，较好地覆盖了表征语音信号重要特性的大部分频谱，实现了接近调幅广播的音质水平。然而受到历史、经济、技术等诸多问题的限制，传统固定和移动通信完全实现从窄带向宽带音频的迈进还需要相当长的一段过渡期。The traditional telephone voice communication system usually transmits narrowband signals, whose frequency is distributed in the range of 300-3400Hz, and the sampling rate is 8kHz. The relevant subjective listening test results show that 91% of syllable intelligibility and 99% of sentence intelligibility are preserved in narrowband speech. However, compared with real speech, the naturalness and subjective quality of narrowband signals transmitted in actual calls are significantly reduced. Due to the absence of high-frequency components, narrowband speech cannot distinguish parts of unvoiced or plosive sounds well, and weakens its ability to describe the characteristics of the speaker. In order to effectively overcome the shortcomings of narrowband audio, wideband audio is widely used in the field of telephone voice communication, and its effective bandwidth is extended to 50Hz ~ 7kHz, which covers most of the frequency spectrum that represents the important characteristics of voice signals, and realizes close to AM broadcasting sound quality level. However, limited by many problems such as history, economy, and technology, it will take a long period of transition for traditional fixed and mobile communications to fully realize the transition from narrowband to wideband audio.

作为一种有效的音频增强方法，频带扩展方法可以在不改变窄带信号信源编码和网络传输的前提下，通过分析原始音频信号的时频特性，在接收端从重建的宽带音频中人为地恢复出编码端所截去的高频成分，进而达到增强重建音频听觉质量的目的。对于听力有损人士，频带扩展方法能够进一步改善其音素和语义的分辨能力。近十几年来，许多研究机构与科研人员针对单声道语音信号的频带扩展相继提出了众多解决方案。这些方法通常分别从频谱包络扩展和频谱细节扩展两个方面出发，进而合成信号高频成分，其原理如图1所示。首先根据人耳听觉感知原理对窄带信号进行时频特征提取；接下来，借助边信息或者先验知识所描述高低频特征之间的映射关系来对高频成分的频谱包络和能量进行估计；同时，选择适当的频谱修补方法来扩展频谱细节；最终，结合扩展后的频谱包络和频谱细节，实现宽带音频信号高频成分的有效重建。As an effective audio enhancement method, the frequency band extension method can artificially recover from the reconstructed wideband audio at the receiving end by analyzing the time-frequency characteristics of the original audio signal without changing the narrowband signal source coding and network transmission. The high-frequency components truncated by the encoding end are output, so as to achieve the purpose of enhancing the auditory quality of the reconstructed audio. For hearing-impaired persons, the frequency band extension method can further improve their phonemic and semantic discrimination. In the past ten years, many research institutions and researchers have successively proposed many solutions for the frequency band extension of monophonic speech signals. These methods usually start from two aspects of spectrum envelope expansion and spectrum detail expansion respectively, and then synthesize the high-frequency components of the signal. The principle is shown in Figure 1. Firstly, time-frequency feature extraction is performed on the narrowband signal according to the principle of human auditory perception; next, the spectral envelope and energy of the high-frequency components are estimated by using the mapping relationship between high and low frequency features described by side information or prior knowledge; At the same time, an appropriate spectrum patching method is selected to expand the spectrum details; finally, combined with the expanded spectrum envelope and spectrum details, the effective reconstruction of the high-frequency components of the wideband audio signal is realized.

对于立体声音频，传统频带扩展方法多针对两个声道进行高频成分独立重建，这类方法仅根据单个声道重建信号的主观质量实现对信号带宽的扩展，没有考虑到两个声道中信号能量和相位的相关性，其重建立体声信号严重影响了听者对声源位置和距离的判定。For stereo audio, the traditional frequency band extension method mostly performs independent reconstruction of high-frequency components for two channels. This type of method only expands the signal bandwidth based on the subjective quality of the reconstructed signal of a single channel, and does not take into account the signals in the two channels. The correlation between energy and phase, the reconstructed stereo signal seriously affects the listener's judgment on the position and distance of the sound source.

发明内容Contents of the invention

鉴于上述问题，本发明提供了一种立体声音频的带宽扩展方法与装置。In view of the above problems, the present invention provides a method and device for bandwidth expansion of stereo audio.

本发明提供的立体声音频的带宽扩展方法，包括以下步骤：The bandwidth extension method of stereo audio provided by the present invention comprises the following steps:

将立体声信号分解为直达声和扩散声；Decompose the stereo signal into direct sound and diffuse sound;

按照预设的频带扩展方法对所述扩散声进行带宽扩展；performing bandwidth expansion on the diffuse sound according to a preset frequency band expansion method;

将所述直达声分离成多个不同方位的点声源，对多个点声源分别进行带宽扩展，得到带宽扩展后的多个点声源；separating the direct sound into a plurality of point sound sources in different orientations, respectively performing bandwidth expansion on the plurality of point sound sources, to obtain a plurality of point sound sources after bandwidth expansion;

将所述带宽扩展后的多个点声源按照预先估计的方位信息进行重新混合，得到带宽扩展后的直达声；remixing the multiple point sound sources after the bandwidth expansion according to the pre-estimated orientation information to obtain the direct sound after the bandwidth expansion;

根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。A broadband stereo audio signal is reconstructed according to the bandwidth-expanded direct sound combined with the bandwidth-extended diffuse sound.

本发明还提供了一种立体声音频的带宽扩展装置，包括：分解模块、扩散声扩展模块、直达声分离与扩展模块、重构模块；The present invention also provides a stereo audio bandwidth expansion device, including: a decomposition module, a diffuse sound expansion module, a direct sound separation and expansion module, and a reconstruction module;

所述分解模块，用于将立体声信号分解为直达声和扩散声；The decomposition module is used to decompose the stereo signal into direct sound and diffuse sound;

所述扩散声扩展模块，用于按照预设的频带扩展方法对所述扩散声进行带宽扩展；The diffuse sound extension module is configured to extend the bandwidth of the diffuse sound according to a preset frequency band extension method;

所述直达声分离与扩展模块，用于将所述直达声分离成多个不同方位的点声源，对多个点声源分别进行带宽扩展，得到带宽扩展后的多个点声源；The direct sound separation and expansion module is used to separate the direct sound into a plurality of point sound sources in different orientations, respectively perform bandwidth expansion on the plurality of point sound sources, and obtain a plurality of point sound sources after bandwidth expansion;

所述重构模块，用于将所述带宽扩展后的多个点声源按照预先估计的方位信息进行重新混合，得到带宽扩展后的直达声，根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。The reconstruction module is configured to remix the multiple point sound sources after the bandwidth expansion according to the pre-estimated orientation information to obtain the direct sound after the bandwidth expansion, and combine the bandwidth expansion with the direct sound after the bandwidth expansion The subsequent diffuse sound reconstructs a wideband stereo audio signal.

本发明有益效果如下：The beneficial effects of the present invention are as follows:

本发明实施例首先利用声道间的频谱相关性将输入立体声信号分解为直达声和扩散声两种成分，然后扩散声成分直接利用传统频带扩展方法进行扩展；直达声则依据不同声源在时频结构上的稀疏性分离成多个不同方位的点声源，并分别进行带宽扩展，最终扩展后的点声源依照其在原始立体声中方位信息进行重新混合，并结合带宽扩展后的扩散声成分，重建出宽带立体声音频信号。本发明解决了现有技术中仅根据单个声道重建信号的主观质量实现对信号带宽的扩展，没有考虑到两个声道中信号能量和相位的相关性，其重建立体声信号严重影响了听者对声源位置和距离的判定的问题。In the embodiment of the present invention, firstly, the input stereo signal is decomposed into two components of direct sound and diffuse sound by utilizing the spectral correlation between channels, and then the diffuse sound component is directly expanded by using the traditional frequency band expansion method; According to the sparsity of the frequency structure, it is separated into multiple point sound sources with different orientations, and the bandwidth is expanded respectively. Finally, the expanded point sound sources are remixed according to their orientation information in the original stereo, and combined with the diffuse sound after bandwidth expansion. components to reconstruct a wideband stereo audio signal. The present invention solves the problem of expanding the signal bandwidth only based on the subjective quality of the reconstructed signal of a single channel in the prior art, without considering the correlation of signal energy and phase in the two channels, and the reconstructed stereo signal seriously affects the listener The problem of judging the location and distance of the sound source.

附图说明Description of drawings

图1为现有技术中单声道语音信号频带扩展方法的基本流程图；Fig. 1 is the basic flowchart of monophonic speech signal frequency band extension method in the prior art;

图2是本发明方法实施例的立体声音频的带宽扩展方法的流程图；Fig. 2 is the flow chart of the bandwidth extension method of stereo audio of the method embodiment of the present invention;

图3是本发明装置实施例的立体声音频的带宽扩展装置的结构示意图；FIG. 3 is a schematic structural diagram of a stereo audio bandwidth extension device according to an embodiment of the present invention;

图4是本发明实例1的立体声音频的带宽扩展方法的原理框图；Fig. 4 is the functional block diagram of the bandwidth expansion method of the stereo audio of the example 1 of the present invention;

图5是本发明实例1中基于深度神经网络的状态空间模型的原理框图。Fig. 5 is a functional block diagram of a state-space model based on a deep neural network in Example 1 of the present invention.

具体实施方式detailed description

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

为了解决现有技术中仅根据单个声道重建信号的主观质量实现对信号带宽的扩展，没有考虑到两个声道中信号能量和相位的相关性，其重建立体声信号严重影响了听者对声源位置和距离的判定的问题，本发明提供了一种立体声音频的带宽扩展方法与装置，以下结合附图以及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不限定本发明。In order to solve the problem of extending the signal bandwidth only based on the subjective quality of the reconstructed signal of a single channel in the prior art, the correlation between the signal energy and phase in the two channels is not considered, and the reconstruction of the stereo signal seriously affects the listener's perception of the sound quality. For the determination of source position and distance, the present invention provides a method and device for bandwidth expansion of stereo audio. The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

根据本发明的方法实施例，提供了一种立体声音频的带宽扩展方法，图1是本发明方法实施例的立体声音频的带宽扩展方法的流程图，如图1所示，根据本发明方法实施例的立体声音频的带宽扩展方法包括如下处理：According to the method embodiment of the present invention, a method for extending the bandwidth of stereo audio is provided. FIG. 1 is a flow chart of the method for extending the bandwidth of stereo audio according to the method embodiment of the present invention. As shown in FIG. 1 , according to the method embodiment of the present invention The bandwidth extension method for stereo audio includes the following processing:

步骤201，将立体声信号分解为直达声和扩散声。Step 201, decompose the stereo signal into direct sound and diffuse sound.

具体的，步骤201包括以下步骤：Specifically, step 201 includes the following steps:

将所述立体声信号分解为左声道和右声道；decomposing the stereo signal into left and right channels;

分别将分帧处理后的左声道和右声道进行时频变换，得到立体声信号的左声道短时频谱成分和右声道短时频谱成分；Carry out time-frequency conversion to the left channel and the right channel after the frame division processing respectively, and obtain the left channel short-term spectral component and the right channel short-term spectral component of the stereo signal;

分别根据所述左声道短时频谱成分和右声道短时频谱成分，得到左右声道信号能量谱之间的和P_sum、左右声道信号能量谱之间的差P_diff、左右声道信号能量谱之间的互相关P_cc；According to the short-time spectral components of the left channel and the short-time spectral components of the right channel, respectively, the sum P _sum between the left and right channel signal energy spectra, the difference P _diff between the left and right channel signal energy spectra, and the left and right channel Cross-correlation P _cc between signal energy spectra;

利用所述P_sum、P_diff、及P_cc通过最小二乘法得到直达声矩阵；Using the P _sum , P _diff , and P _cc to obtain the direct acoustic matrix by the least square method;

利用所述直达声矩阵从所述立体声信号中分离出直达声；separating direct sound from said stereo signal using said direct sound matrix;

利用所述立体声信号减去所述直达声得到扩散声。The diffuse sound is obtained by subtracting the direct sound from the stereo signal.

更加具体的，所述分别根据所述左声道短时频谱成分和右声道短时频谱成分，得到左右声道信号能量谱之间的和P_sum、左右声道信号能量谱之间的差P_diff、左右声道信号能量谱之间的互相关P_cc、包括：More specifically, according to the short-time spectral components of the left channel and the short-term spectral components of the right channel, the sum P _sum between the left and right channel signal energy spectra and the difference between the left and right channel signal energy spectra are obtained P _diff , the cross-correlation P _cc between the left and right channel signal energy spectra, including:

利用所述左声道短时频谱成分S_L(t,f)和所述右声道短时频谱成分S_R(t,f)根据公式P_cc＝R{S_L(t,f)S_R ^*(t,f)}计算左右声道信号能量谱之间的互相关P_cc，其中R{}为取实部操作。Using the left channel short-term spectral component S _L (t, f) and the right channel short-term spectral component S _R (t, f) according to the formula P _cc =R{S _L (t,f)S _R ^* (t,f)} calculates the cross-correlation P _cc between the left and right channel signal energy spectra, where R{} is the real part operation.

更加具体的，所述利用所述直达声矩阵从立体声信号中分离出直达声，包括：More specifically, said separating the direct sound from the stereo signal by using the direct sound matrix includes:

利用所述直达声矩阵M_D(t,f)根据公式1从立体声信号S(t,f)中分离出直达声S'(t,f)；Using the direct sound matrix M _D (t, f) to separate the direct sound S'(t, f) from the stereo signal S(t, f) according to formula 1;

S′(t，f)＝M_D(t，f)[S_L(t，f)S_R(t，f)]^T公式1。S'(t, f) = M _D (t, f) [S _L (t, f) S _R (t, f)] ^T Formula 1.

步骤202，按照预设的频带扩展方法对所述扩散声进行带宽扩展。Step 202, performing bandwidth extension on the diffuse sound according to a preset frequency band extension method.

具体的，步骤202直接利用传统的频带扩展方法对所述扩散声进行带宽扩展，本发明不作赘述。Specifically, step 202 directly uses the traditional frequency band extension method to extend the bandwidth of the diffuse sound, which will not be described in detail in the present invention.

步骤203，将所述直达声分离成多个不同方位的点声源，对多个点声源分别进行带宽扩展，得到带宽扩展后的多个点声源。In step 203, the direct sound is separated into multiple point sound sources in different directions, and bandwidth expansion is performed on the multiple point sound sources respectively to obtain multiple point sound sources after bandwidth expansion.

具体的，步骤203中将所述直达声分离成多个不同方位的点声源，包括：Specifically, in step 203, the direct sound is separated into multiple point sound sources with different orientations, including:

计算每一个时频点上直达声的方向信息，对全部时频点的方向信息进行聚类，得到方向信息的聚类中心，所述聚类中心分别对应各个点声源的方向信息；Calculate the direction information of the direct sound at each time-frequency point, cluster the direction information of all time-frequency points, and obtain the cluster centers of the direction information, and the cluster centers correspond to the direction information of each point sound source;

根据某一时频点上直达声的方向信息和所述方向信息的聚类中心，得到掩蔽矩阵；Obtain a masking matrix according to the direction information of the direct sound at a certain time-frequency point and the cluster center of the direction information;

利用所述掩蔽矩阵对直达声进行分离，得到多个不同方位的点声源。The direct sound is separated by using the masking matrix to obtain multiple point sound sources with different orientations.

具体的，所述对多个点声源分别进行带宽扩展，包括：Specifically, the bandwidth expansion of multiple point sound sources includes:

将多个点声源分别输入到预设的状态空间模型中拟合窄带信号的短时频谱和宽带信号的短时频谱之间的映射关系，并根据预设的误差准则对宽带信号短时频谱高频成分的频谱包络进行估计，结合低频频谱包络和采用适当频谱修补方法扩展后的频谱细节，得到带宽扩展后的多个点声源。Input multiple point sound sources into the preset state space model to fit the mapping relationship between the short-time spectrum of the narrowband signal and the short-time spectrum of the wideband signal, and analyze the short-time spectrum of the wideband signal according to the preset error criterion The spectrum envelope of the high-frequency component is estimated, combined with the low-frequency spectrum envelope and the spectrum details expanded by the appropriate spectrum patching method, multiple point sound sources with bandwidth expansion are obtained.

更加具体的，所述在所述状态空间模型中拟合窄带信号的短时频谱和宽带信号的短时频谱之间的映射关系，并根据预设的误差准则对高频成分的频谱包络进行估计，包括：More specifically, the mapping relationship between the short-time spectrum of the narrowband signal and the short-time spectrum of the wideband signal is fitted in the state space model, and the spectrum envelope of the high-frequency component is calculated according to a preset error criterion estimates, including:

利用前一时刻隐藏状态矢量和前一时刻窄带信号的短时频谱，得到状态空间模型中隐藏状态矢量；Using the hidden state vector at the previous moment and the short-time spectrum of the narrowband signal at the previous moment, the hidden state vector in the state space model is obtained;

利用所述状态空间模型中隐藏状态矢量和当前时刻窄带信号的短时频谱，得到宽带信号的短时频谱。The short-time spectrum of the wideband signal is obtained by using the hidden state vector in the state space model and the short-time spectrum of the narrowband signal at the current moment.

步骤204，将所述多个带宽扩展后的点声源按照预设的方位信息进行重新混合，得到带宽扩展后的直达声，根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。Step 204: Remix the multiple bandwidth-expanded point sound sources according to the preset orientation information to obtain the bandwidth-expanded direct sound, and reconstruct according to the bandwidth-expanded direct sound combined with the bandwidth-extended diffuse sound Output wideband stereo audio signal.

具体的，所述预先估计的方位信息根据所述方向信息的聚类中心估计得到，所述估计的方法为本领域的常规技术手段，本发明对此不作赘述。Specifically, the pre-estimated orientation information is estimated according to the clustering center of the orientation information, and the estimation method is a conventional technical means in the art, which will not be described in detail in the present invention.

具体的，利用公式2根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号；Specifically, using Formula 2 to reconstruct a broadband stereo audio signal according to the bandwidth-expanded direct sound combined with the bandwidth-expanded diffuse sound;

在公式2中，表示宽带扩展后立体声信号的短时频谱；表示宽带扩展后直达声的短时频谱；表示带宽扩展后扩散声的短时频谱。In Equation 2, Represents the short-term spectrum of the wideband extended stereo signal; Represents the short-term spectrum of the direct sound after broadband expansion; Represents the short-term spectrum of diffuse sound after bandwidth expansion.

与本发明的方法实施例相对应，提供了一种立体声音频的带宽扩展装置，图3是本发明装置实施例的立体声音频的带宽扩展装置的结构示意图，如图3所示，根据本发明装置实施例的立体声音频的带宽扩展装置包括：分解模块30、扩散声扩展模块32、直达声分离与扩展模块34、重构模块36，以下对本发明实施例的各个模块进行详细的说明。Corresponding to the method embodiment of the present invention, a stereo audio bandwidth expansion device is provided. FIG. 3 is a schematic structural diagram of the stereo audio bandwidth expansion device in the device embodiment of the present invention. As shown in FIG. 3 , the device according to the present invention The stereo audio bandwidth expansion device of the embodiment includes: a decomposition module 30 , a diffuse sound expansion module 32 , a direct sound separation and expansion module 34 , and a reconstruction module 36 , and each module of the embodiment of the present invention will be described in detail below.

具体地，所述分解模块30，用于将立体声信号分解为直达声和扩散声；Specifically, the decomposition module 30 is configured to decompose the stereo signal into direct sound and diffuse sound;

所述扩散声扩展模块32，用于按照预设的频带扩展方法对所述扩散声进行带宽扩展；The diffuse sound extension module 32 is configured to extend the bandwidth of the diffuse sound according to a preset frequency band extension method;

所述直达声分离与扩展模块34，用于将所述直达声分离成多个不同方位的点声源，对多个点声源分别进行带宽扩展，得到带宽扩展后的多个点声源；The direct sound separation and expansion module 34 is used to separate the direct sound into a plurality of point sound sources in different orientations, respectively perform bandwidth expansion on the plurality of point sound sources, and obtain a plurality of point sound sources after bandwidth expansion;

所述重构模块36，用于将所述带宽扩展后的多个点声源按照预先估计的方位信息进行重新混合，得到带宽扩展后的直达声，用于根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。The reconstruction module 36 is configured to remix the multiple point sound sources after the bandwidth expansion according to the pre-estimated orientation information to obtain the direct sound after the bandwidth expansion, and to obtain the direct sound after the bandwidth expansion according to the direct sound after the bandwidth expansion. A wideband stereo audio signal is reconstructed by combining the diffused sound after bandwidth expansion.

所述分解模块30具体用于：The decomposition module 30 is specifically used for:

所述直达声分离与扩展模块34具体用于：The direct sound separation and expansion module 34 is specifically used for:

将多个点声源分别输入到预设的状态空间模型中拟合窄带信号的短时频谱和宽带信号的短时频谱之间的映射关系，并根据预设的误差准则对宽带信号短时频谱高频成分的频谱包络进行估计，结合低频频谱包络和采用适当频谱修补方法扩展后的频谱细节，最终得到带宽扩展后的直达声。Input multiple point sound sources into the preset state space model to fit the mapping relationship between the short-time spectrum of the narrowband signal and the short-time spectrum of the wideband signal, and analyze the short-time spectrum of the wideband signal according to the preset error criterion The spectrum envelope of the high-frequency component is estimated, combined with the low-frequency spectrum envelope and the spectrum details expanded by the appropriate spectrum patching method, and finally the direct sound after bandwidth expansion is obtained.

为了更加详细的说明本发明的技术方案，给出实例1，图4是本发明实例1的立体声音频的带宽扩展方法的原理框图，如图4所示，一种立体声音频的带宽扩展方法包括以下步骤：In order to illustrate the technical solution of the present invention in more detail, give example 1, Fig. 4 is the functional block diagram of the bandwidth extension method of the stereo audio of the present invention example 1, as shown in Fig. 4, a kind of bandwidth extension method of stereo audio includes the following step:

1.直达声/扩散声分离1. Direct sound/diffuse sound separation

本文所提出的立体声扩展系统采用离散傅里叶变换或者正交镜像滤波器组将分帧后的左右声道音频信号各自转换到频域，并根据人耳听觉感知原理划分为多个均匀子带或临界频带。那么，输入立体声信号的短时频谱S(t,f)可以表示为S(t,f)＝[S_L(t,f)S_R(t,f)]^T The stereo extension system proposed in this paper uses discrete Fourier transform or orthogonal mirror filter bank to convert the framed left and right channel audio signals into the frequency domain respectively, and divides them into multiple uniform subbands according to the principle of human auditory perception or critical bands. Then, the short-term spectrum S(t,f) of the input stereo signal can be expressed as S(t,f)=[S _L (t,f) _SR (t,f)] ^T

其中，t和f分别表示信号的时间帧和子带序号；S_L(t,f)和S_R(t,f)则分别表示立体声信号的左右声道短时频谱成分。Among them, t and f represent the time frame and sub-band number of the signal, respectively; _SL (t, f) and S _R (t, f) represent the short-term spectral components of the left and right channels of the stereo signal, respectively.

为了有效地分离直达声和扩散声，系统还需要分别计算左右声道信号能量谱之间的和P_sum和差P_diff以及两个声道的互相关P_cc。In order to effectively separate the direct sound from the diffuse sound, the system also needs to calculate the sum P _sum and difference P _diff between the left and right channel signal energy spectra and the cross-correlation P _cc of the two channels.

P_cc＝R{S_L(t,f)S_R ^*(t,f)}P _cc ＝R{S _L (t,f)S _R ^* (t,f)}

其中，R{}为取实部操作。为了改善分离算法的稳定性，分别对计算得到的P_sum、P_diff和P_cc进行时间平滑。Among them, R{} is the operation of taking the real part. In order to improve the stability of the separation algorithm, time smoothing is performed on the calculated P _sum , P _diff and P _cc respectively.

立体声左右声道中的直达声成分之间高度相关，并可表示为由某一方向传播来的点声源信号。据此，本文所提系统利用一个直达声矩阵从原始立体声双声道信号S(t,f)中直接分离出直达声成分S'(t,f)，如下式所示，The direct sound components in the left and right stereo channels are highly correlated and can be represented as a point source signal traveling from a certain direction. Accordingly, the system proposed in this paper uses a direct sound matrix to directly separate the direct sound component S'(t,f) from the original stereo two-channel signal S(t,f), as shown in the following formula,

S'(t,f)＝[S_L'(t,f)S_R'(t,f)]^T＝M_D(t,f)[S_L(t,f)S_R(t,f)]^T＝M_D(t,f)S(t,f)S'(t,f)＝[S _L '(t,f)S _R '(t,f)] ^T ＝M _D (t,f)[S _L (t,f)S _R (t,f) ] ^T = M _D (t,f)S(t,f)

其中，S_L'(t,f)和S_R'(t,f)分别表示直达声的左右声道短时频谱成分，M_D(t，f)为直达声矩阵。根据文献【M Vinton,D McGrath,C Robinson,P Brown,next generationsurround decoding and upmixing for consumer ad professional applications.AES57^th International conference,USA,2015】所述，直达声矩阵M_D(t，f)可以利用最小二乘方法获得，从而使得估计得到的直达声成分和真实成分之间的期望平方误差最小，即Among them, S _L '(t,f) and S _R '(t,f) respectively represent the short-term spectral components of the left and right channels of the direct sound, and M _D (t, f) is the matrix of the direct sound. According to the literature [M Vinton, D McGrath, C Robinson, P Brown, next generationsurround decoding and upmixing for consumer ad professional applications. ^{AES57 th} International conference, USA, 2015], the direct acoustic matrix M _D (t, f) can be used The least square method is obtained, so that the expected square error between the estimated direct sound component and the real component is minimized, that is,

则直达声矩阵M_D(t，f)可以由下式计算得到，Then the direct acoustic matrix M _D (t, f) can be calculated by the following formula,

而扩散声成分S”(t,f)则可以表示为原始立体声信号和直达声成分之差，The diffuse sound component S”(t,f) can be expressed as the difference between the original stereo signal and the direct sound component,

S”(t,f)＝S(t,f)-S'(t,f)S"(t,f)=S(t,f)-S'(t,f)

2.直达声成分的声源分离2. Sound source separation of direct sound components

根据S'(t,f)＝[S_L'(t,f)S_R'(t,f)]^T利用公式3得到某一时频点上直达声S'(t,f)的方向信息θ(t,f)，所述某一时频点上直达声S'(t,f)的方向信息θ(t,f)与点声源的方向信息θ_i相同；According to S'(t,f)=[S _L '(t,f)S _R '(t,f)] ^T use formula 3 to get the direction information θ of the direct sound S'(t,f) at a certain time-frequency point (t, f), the direction information θ (t, f) of the direct sound S' (t, f) at a certain time-frequency point is the same as the direction information θ _i of the point sound source;

对全部时频点的方向信息θ(t,f)进行聚类，得到方向信息的聚类中心C_i,i＝1、2…N；这些聚类中心分别对应各个点声源S₁(t,f)、S₂(t,f)、S₃(t,f)…S_N(t,f)的方向信息θ₁、θ₂、θ₃…θ_N；Cluster the direction information θ(t,f) of all time-frequency points to obtain the cluster centers C _i of direction information, i=1, 2...N; these cluster centers correspond to each point sound source S ₁ (t ,f), S ₂ (t,f), S ₃ (t,f)...S _N (t,f) direction information θ ₁ , θ ₂ , θ ₃ ...θ _N ;

根据某一时频点上直达声S'(t,f)的方向信息θ(t,f)和聚类中心C_i得到掩蔽矩阵m_i(t,f)；According to the direction information θ(t,f) of the direct sound S'(t,f) at a certain time-frequency point and the clustering center C _i , the masking matrix m _i (t,f) is obtained;

利用所述掩蔽矩阵m_i(t,f)根据公式4对直达声S'(t,f)进行分离，得到直达点声源 Use the masking matrix m _i (t, f) to separate the direct sound S'(t, f) according to formula 4 to obtain the direct point sound source

3.带宽扩展3. Bandwidth expansion

根据上文所述方法，分别从立体声信号中分离出扩散声成分S”(t,f)和直达声成分S'(t,f)，并利用时频稀疏性进一步将直达声成分S'(t,f)分离成多个点声源接下来可以根据单声道频带扩展方法分别对扩散声S”(t,f)和直达点声源进行独立的带宽扩展。According to the method described above, the diffuse sound component S”(t,f) and the direct sound component S’(t,f) are separated from the stereo signal, and the direct sound component S’(t,f) is further separated by using time-frequency sparsity t, f) separated into multiple point sound sources Next, the diffuse sound S”(t, f) and the direct point sound source can be respectively analyzed according to the monophonic frequency band extension method Perform independent bandwidth expansion.

本文采用状态空间模型来直接拟合窄宽带频谱参数之间的映射关系，并在实际扩展中根据一定的误差准则对高频成分的频谱包络进行估计，In this paper, the state-space model is used to directly fit the mapping relationship between the narrow-band spectrum parameters, and the spectrum envelope of the high-frequency components is estimated according to a certain error criterion in the actual extension.

S_Y(t,f)＝F[S_X(t,f)]S _Y (t,f)=F[S _X (t,f)]

式中，S_X(t,f)和S_Y(t,f)分别表示窄带和宽带信号的短时频谱，F[]表示映射(或估计)函数。In the formula, S _X (t, f) and S _Y (t, f) represent the short-term spectrum of the narrowband and wideband signals, respectively, and F[] represents the mapping (or estimation) function.

根据状态空间模型，映射函数F[]可以由状态演变函数F_state[]和观察函数F_obs[]两个过程来描述，如下式所示，According to the state space model, the mapping function F[] can be described by two processes of the state evolution function F _state [] and the observation function F _obs [], as shown in the following formula,

S_hidden(t,f)＝F_state[S_hidden(t-1,f),S_X(t-1,f),N₁(t,f)]S _hidden (t,f)＝F _state [S _hidden (t-1,f),S _X (t-1,f),N ₁ (t,f)]

S_Y(t,f)＝F_obs[S_hidden(t,f),S_X(t,f),N₂(t,f)]S _Y (t,f)＝F _obs [S _hidden (t,f),S _X (t,f),N ₂ (t,f)]

其中，S_hidden(t,f)为模型中隐藏状态矢量，N₁(t,f)和N₂(t,f)分别描述状态演变函数F_state和观测函数F_obs的误差。上述模型中，当前时刻的隐藏状态矢量S_hidden(t,f)由前一时刻隐藏状态矢量S_hidden(t-1,f)和前一时刻窄带信号的短时频谱S_X(t-1,f)所决定，而当前时刻宽带信号短时频谱S_Y(t,f)则进一步由当前时刻隐藏状态矢量S_hidden(t,f)和当前时刻窄带信号的短时频谱S_X(t,f)决定。利用状态空间模型中蕴含的隐藏状态递归结构能够更加精确地拟合窄宽带频谱参数之间的复杂映射关系，该模型可以采用广义卡尔曼滤波方法实现，也可以采用两个相互独立的深度神经网络来实现。基于深度神经网络的状态空间模型基本原理如图5所示。此处，状态演变函数F_state和观测函数F_obs可以采用堆栈自编码器、多层感知器、延时递归网络、长短时记忆网络等各种前向和递归深度神经网络实现。Among them, S _hidden (t, f) is the hidden state vector in the model, and N ₁ (t, f) and N ₂ (t, f) respectively describe the errors of the state evolution function F _state and the observation function F _obs . In the above model, the hidden state vector S _hidden (t, f) at the current moment is composed of the hidden state vector S _hidden (t-1, f) at the previous moment and the short-term spectrum S _X (t-1, f) of the narrowband signal at the previous moment f) and the short-time spectrum S _Y (t, f) of the broadband signal at the current moment is further determined by the hidden state vector S _hidden (t, f) at the current moment and the short-time spectrum S _X (t, f) of the narrowband signal at the current moment )Decide. Using the hidden state recursive structure contained in the state-space model can more accurately fit the complex mapping relationship between narrow-bandwidth spectral parameters. The model can be implemented by generalized Kalman filter method, or by two independent deep neural networks. to fulfill. The basic principle of the state space model based on deep neural network is shown in Fig. 5. Here, the state evolution function F _state and the observation function F _obs can be implemented by various forward and recursive deep neural networks such as stacked autoencoders, multi-layer perceptrons, delayed recurrent networks, and long-short-term memory networks.

4.立体声信号合成4. Stereo signal synthesis

采用单声道频带扩展方法可以分别对扩散声S”(t,f)和直达点声源2,…,N进行扩展，从而得到相应的宽带频谱S_Y(t,f)。接下来，可以利用各个点声源方向信息θ_i来重现宽带直达声 The diffuse sound S”(t, f) and the direct point sound source can be respectively analyzed by using the mono frequency band extension method 2,...,N to get the corresponding broadband spectrum S _Y (t, f). Next, the direction information θ _i of each point sound source can be used to reproduce the broadband direct sound

其中，为扩展后的点声源宽带频谱。为扩展后宽带直达声的短时频谱。最终，结合扩展后的宽带扩散声可以实现宽带立体声信号的重现，in, is the extended broadband spectrum of the point sound source. is the short-term spectrum of the extended broadband direct sound. Ultimately, combined with the extended broadband diffuse sound wideband stereo signal possible reappearance,

以上所述仅为本发明的实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的权利要求范围之内。The above description is only an embodiment of the present invention, and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the scope of the claims of the present invention.

Claims

1. A method of bandwidth extension for stereo audio, comprising:

decomposing the stereo signal into direct sound and diffuse sound;

performing bandwidth extension on the diffused sound according to a preset frequency band extension method;

separating the direct sound into a plurality of point sound sources in different directions, and respectively performing bandwidth expansion on the plurality of point sound sources to obtain a plurality of point sound sources with expanded bandwidths;

and remixing the plurality of point sound sources after bandwidth expansion according to the pre-estimated azimuth information to obtain direct sound after bandwidth expansion, and reconstructing a broadband stereo audio signal according to the direct sound after bandwidth expansion and the diffuse sound after bandwidth expansion.

2. The method of bandwidth extension of stereo audio according to claim 1, wherein said decomposing the stereo signal into direct sound and diffuse sound comprises:

decomposing the stereo signal into a left channel and a right channel;

respectively carrying out time-frequency transformation on the left channel and the right channel subjected to the framing processing to obtain a left channel short-time frequency spectrum component and a right channel short-time frequency spectrum component of the stereo signal;

respectively obtaining the sum P between the energy spectrums of the left and right sound channel signals according to the left and right sound channel short-time frequency spectrum components_sumThe difference P between the energy spectra of the left and right channel signals_diffCross-correlation P between the energy spectra of the left and right channel signals_cc；

Using said P_sum、P_diffAnd P_ccObtaining a direct sound matrix through a least square method;

separating direct sound from the stereo signal using the direct sound matrix;

and eliminating the direct sound from the stereo signal to obtain diffuse sound.

3. The method of bandwidth extension for stereo audio according to claim 1,

the separating the direct sound into a plurality of point sound sources of different azimuths comprises:

calculating the direction information of direct sound on each time frequency point, clustering the direction information of all the time frequency points to obtain a clustering center of the direction information, wherein the clustering center respectively corresponds to the direction information of each point sound source;

obtaining a masking matrix according to the direction information of the direct sound at a certain time frequency point and the clustering center of the direction information;

and separating the direct sound by using the masking matrix to obtain a plurality of point sound sources in different directions.

4. The method of bandwidth extension for stereo audio according to claim 1,

the bandwidth extension is respectively carried out on the plurality of point sound sources, and the bandwidth extension comprises the following steps:

respectively inputting a plurality of point sound sources into a preset state space model to fit the mapping relation between the short-time frequency spectrum of the narrow-band signal and the short-time frequency spectrum of the broadband signal, estimating the spectrum envelope of the high-frequency component of the short-time frequency spectrum of the broadband signal according to a preset error criterion, and combining the low-frequency spectrum envelope and the spectrum details after being expanded by adopting a proper spectrum repairing method to obtain the plurality of point sound sources after bandwidth expansion.

5. The method of bandwidth extension for stereo audio according to claim 4,

the fitting of the mapping relationship between the short-time spectrum of the narrowband signal and the short-time spectrum of the wideband signal in the state space model and the estimation of the spectrum envelope of the high-frequency component according to a preset error criterion comprises:

obtaining a hidden state vector in the preset state space model by using the hidden state vector at the previous moment and the short-time frequency spectrum of the narrowband signal at the previous moment;

and obtaining the short-time frequency spectrum of the broadband signal by using the hidden state vector in the preset state space model and the short-time frequency spectrum of the narrow-band signal at the current moment.

6. An apparatus for bandwidth extension of stereo audio, comprising: the system comprises a decomposition module, a diffuse sound expansion module, a direct sound separation and expansion module and a reconstruction module;

the decomposition module is used for decomposing the stereo signal into direct sound and diffuse sound;

the diffuse sound expansion module is used for performing bandwidth expansion on diffuse sound according to a preset frequency band expansion method;

the direct sound separation and expansion module is used for separating the direct sound into a plurality of point sound sources in different directions, and performing bandwidth expansion on the plurality of point sound sources respectively to obtain a plurality of point sound sources after bandwidth expansion;

and the reconstruction module is used for remixing the plurality of point sound sources after the bandwidth expansion according to the pre-estimated azimuth information to obtain direct sound after the bandwidth expansion, and reconstructing a broadband stereo audio signal according to the direct sound after the bandwidth expansion and the diffuse sound after the bandwidth expansion.

7. The apparatus of claim 6, wherein the decomposition module is specifically configured to:

decomposing the stereo signal into a left channel and a right channel;

separating direct sound from the stereo signal using the direct sound matrix;

8. The stereo audio bandwidth extension device of claim 6, wherein the direct sound separation and extension module is specifically configured to:

9. The stereo audio bandwidth extension device of claim 6, wherein the direct sound separation and extension module is specifically configured to:

respectively inputting a plurality of point sound sources into a preset state space model to fit the mapping relation between the short-time frequency spectrum of the narrow-band signal and the short-time frequency spectrum of the broadband signal, estimating the spectrum envelope of the high-frequency component of the short-time frequency spectrum of the broadband signal according to a preset error criterion, and obtaining the direct sound after bandwidth expansion by combining the low-frequency spectrum envelope and the spectrum details after expansion by adopting a proper spectrum repairing method.

10. The stereo audio bandwidth extension device of claim 9, wherein the direct sound separation and extension module is specifically configured to: