CN106960672B - Bandwidth extension method and device for stereo audio - Google Patents
Bandwidth extension method and device for stereo audio Download PDFInfo
- Publication number
- CN106960672B CN106960672B CN201710203054.1A CN201710203054A CN106960672B CN 106960672 B CN106960672 B CN 106960672B CN 201710203054 A CN201710203054 A CN 201710203054A CN 106960672 B CN106960672 B CN 106960672B
- Authority
- CN
- China
- Prior art keywords
- sound
- signal
- direct sound
- short
- time frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000005236 sound signal Effects 0.000 claims abstract description 16
- 238000001228 spectrum Methods 0.000 claims description 74
- 239000011159 matrix material Substances 0.000 claims description 24
- 238000000926 separation method Methods 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 11
- 230000000873 masking effect Effects 0.000 claims description 10
- 238000000354 decomposition reaction Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims 2
- 230000003595 spectral effect Effects 0.000 description 36
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 208000032041 Hearing impaired Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
- G10L21/0388—Details of processing therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
Abstract
Description
技术领域technical field
本发明涉及网络技术应用领域,特别涉及一种立体声音频的带宽扩展方法与装置。The invention relates to the application field of network technology, in particular to a method and device for bandwidth expansion of stereo audio.
背景技术Background technique
在数字音频信号处理技术中,通常将覆盖人耳可感知的20Hz~20KHz全部频率范围内的音频信号称作全带音频,这类信号主要应用于音乐信号的高保真重现。现阶段的音频即时通信系统无法提供足够的网络传输速率和终端处理能力,不可避免地会限制重建信号的有效带宽,优先量化编码音频信号的低频成分,进而提升音频通信系统的编码效率。In digital audio signal processing technology, audio signals covering the entire frequency range of 20Hz to 20KHz perceivable by the human ear are usually called full-band audio. Such signals are mainly used for high-fidelity reproduction of music signals. The current audio instant messaging system cannot provide sufficient network transmission rate and terminal processing capacity, which will inevitably limit the effective bandwidth of the reconstructed signal, and prioritize the low-frequency components of the encoded audio signal, thereby improving the encoding efficiency of the audio communication system.
传统电话语音通信系统通常传输的是窄带信号,其频率分布在300~3400Hz范围内,采样率为8kHz。相关主观听力测试结果表明,窄带语音中保留了91%的音节可懂度以及99%的语句可理解性。但是相比于真实语音,在实际通话中所传输窄带信号的自然度和主观质量均有明显下降。由于高频成分的缺失,窄带语音无法良好地区分部分的清音或爆破音,并削弱了其描述说话人特性的能力。为了有效地克服窄带音频的不足,宽带音频被广泛应用到了电话语音通信领域中,其有效带宽扩展到50Hz~7kHz,较好地覆盖了表征语音信号重要特性的大部分频谱,实现了接近调幅广播的音质水平。然而受到历史、经济、技术等诸多问题的限制,传统固定和移动通信完全实现从窄带向宽带音频的迈进还需要相当长的一段过渡期。The traditional telephone voice communication system usually transmits narrow-band signals, and its frequency is distributed in the range of 300 to 3400 Hz, and the sampling rate is 8 kHz. Correlative subjective listening test results show that 91% of syllable intelligibility and 99% of sentence intelligibility are preserved in narrowband speech. However, compared with real speech, the naturalness and subjective quality of narrowband signals transmitted in actual calls are significantly reduced. Due to the absence of high-frequency components, narrow-band speech cannot well distinguish partial unvoiced or plosive sounds, and impairs its ability to describe speaker characteristics. In order to effectively overcome the shortcomings of narrowband audio, wideband audio is widely used in the field of telephone voice communication. Its effective bandwidth is extended to 50Hz ~ 7kHz, which better covers most of the spectrum that characterizes the important characteristics of voice signals, and achieves close to AM broadcasting. sound quality level. However, limited by many problems such as history, economy, technology, etc., it will take a long transition period for traditional fixed and mobile communications to fully realize the transition from narrowband to wideband audio.
作为一种有效的音频增强方法,频带扩展方法可以在不改变窄带信号信源编码和网络传输的前提下,通过分析原始音频信号的时频特性,在接收端从重建的宽带音频中人为地恢复出编码端所截去的高频成分,进而达到增强重建音频听觉质量的目的。对于听力有损人士,频带扩展方法能够进一步改善其音素和语义的分辨能力。近十几年来,许多研究机构与科研人员针对单声道语音信号的频带扩展相继提出了众多解决方案。这些方法通常分别从频谱包络扩展和频谱细节扩展两个方面出发,进而合成信号高频成分,其原理如图1所示。首先根据人耳听觉感知原理对窄带信号进行时频特征提取;接下来,借助边信息或者先验知识所描述高低频特征之间的映射关系来对高频成分的频谱包络和能量进行估计;同时,选择适当的频谱修补方法来扩展频谱细节;最终,结合扩展后的频谱包络和频谱细节,实现宽带音频信号高频成分的有效重建。As an effective audio enhancement method, the band extension method can artificially recover from the reconstructed wideband audio at the receiving end by analyzing the time-frequency characteristics of the original audio signal without changing the source coding and network transmission of the narrowband signal. The high-frequency components cut off by the encoding end are extracted, so as to achieve the purpose of enhancing the auditory quality of the reconstructed audio. For the hearing-impaired, the band extension method can further improve phonemic and semantic discrimination. In the past ten years, many research institutions and researchers have successively proposed many solutions for the frequency band extension of monophonic speech signals. These methods usually start from two aspects of spectrum envelope expansion and spectrum detail expansion, and then synthesize the high-frequency components of the signal. The principle is shown in Figure 1. Firstly, according to the principle of human auditory perception, the time-frequency features of the narrowband signal are extracted; then, the spectral envelope and energy of the high-frequency components are estimated with the help of the mapping relationship between the high-frequency and low-frequency features described by the side information or prior knowledge; At the same time, an appropriate spectral patching method is selected to spread the spectral details; finally, the effective reconstruction of the high-frequency components of the wideband audio signal is achieved by combining the expanded spectral envelope and spectral details.
对于立体声音频,传统频带扩展方法多针对两个声道进行高频成分独立重建,这类方法仅根据单个声道重建信号的主观质量实现对信号带宽的扩展,没有考虑到两个声道中信号能量和相位的相关性,其重建立体声信号严重影响了听者对声源位置和距离的判定。For stereo audio, traditional frequency band expansion methods mostly reconstruct high-frequency components independently for two channels. These methods only expand the signal bandwidth according to the subjective quality of the reconstructed signal of a single channel, and do not consider the signals in the two channels. The correlation of energy and phase, which reconstructs the stereo signal seriously affects the listener's determination of the location and distance of the sound source.
发明内容SUMMARY OF THE INVENTION
鉴于上述问题,本发明提供了一种立体声音频的带宽扩展方法与装置。In view of the above problems, the present invention provides a method and device for bandwidth expansion of stereo audio.
本发明提供的立体声音频的带宽扩展方法,包括以下步骤:The bandwidth expansion method of stereo audio provided by the present invention comprises the following steps:
将立体声信号分解为直达声和扩散声;Decompose the stereo signal into direct sound and diffuse sound;
按照预设的频带扩展方法对所述扩散声进行带宽扩展;Bandwidth expansion is performed on the diffused sound according to a preset frequency band expansion method;
将所述直达声分离成多个不同方位的点声源,对多个点声源分别进行带宽扩展,得到带宽扩展后的多个点声源;The direct sound is separated into a plurality of point sound sources with different orientations, and bandwidth expansion is performed on the plurality of point sound sources respectively, so as to obtain a plurality of point sound sources after the bandwidth expansion;
将所述带宽扩展后的多个点声源按照预先估计的方位信息进行重新混合,得到带宽扩展后的直达声;remixing the multiple point sound sources after the bandwidth expansion according to the pre-estimated azimuth information to obtain the direct sound after the bandwidth expansion;
根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。A wideband stereo audio signal is reconstructed according to the bandwidth-expanded direct sound combined with the bandwidth-expanded diffused sound.
本发明还提供了一种立体声音频的带宽扩展装置,包括:分解模块、扩散声扩展模块、直达声分离与扩展模块、重构模块;The invention also provides a bandwidth expansion device for stereo audio, comprising: a decomposition module, a diffused sound expansion module, a direct sound separation and expansion module, and a reconstruction module;
所述分解模块,用于将立体声信号分解为直达声和扩散声;The decomposition module is used to decompose the stereo signal into direct sound and diffused sound;
所述扩散声扩展模块,用于按照预设的频带扩展方法对所述扩散声进行带宽扩展;The diffused sound expansion module is used to expand the bandwidth of the diffused sound according to a preset frequency band expansion method;
所述直达声分离与扩展模块,用于将所述直达声分离成多个不同方位的点声源,对多个点声源分别进行带宽扩展,得到带宽扩展后的多个点声源;The direct sound separation and expansion module is used to separate the direct sound into a plurality of point sound sources with different orientations, and respectively perform bandwidth expansion on the plurality of point sound sources to obtain a plurality of point sound sources after bandwidth expansion;
所述重构模块,用于将所述带宽扩展后的多个点声源按照预先估计的方位信息进行重新混合,得到带宽扩展后的直达声,根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。The reconstruction module is used to remix the multiple point sound sources after the bandwidth expansion according to the pre-estimated azimuth information to obtain the direct sound after the bandwidth expansion, and combine the bandwidth expansion according to the direct sound after the bandwidth expansion. The diffused sound after reconstruction reconstructs a wideband stereo audio signal.
本发明有益效果如下:The beneficial effects of the present invention are as follows:
本发明实施例首先利用声道间的频谱相关性将输入立体声信号分解为直达声和扩散声两种成分,然后扩散声成分直接利用传统频带扩展方法进行扩展;直达声则依据不同声源在时频结构上的稀疏性分离成多个不同方位的点声源,并分别进行带宽扩展,最终扩展后的点声源依照其在原始立体声中方位信息进行重新混合,并结合带宽扩展后的扩散声成分,重建出宽带立体声音频信号。本发明解决了现有技术中仅根据单个声道重建信号的主观质量实现对信号带宽的扩展,没有考虑到两个声道中信号能量和相位的相关性,其重建立体声信号严重影响了听者对声源位置和距离的判定的问题。In the embodiment of the present invention, the input stereo signal is firstly decomposed into two components: direct sound and diffused sound by using the spectral correlation between channels, and then the diffused sound component is directly expanded by using the traditional frequency band expansion method; The sparseness of the frequency structure is separated into multiple point sound sources with different orientations, and the bandwidths are expanded respectively. Finally, the expanded point sound sources are remixed according to their orientation information in the original stereo, and combined with the diffused sound after bandwidth expansion. components to reconstruct a wideband stereo audio signal. The invention solves the problem that in the prior art, the expansion of the signal bandwidth is only realized according to the subjective quality of the reconstructed signal of a single channel, and the correlation between the signal energy and the phase in the two channels is not considered, and the reconstructed stereo signal seriously affects the listener. The problem of determining the position and distance of the sound source.
附图说明Description of drawings
图1为现有技术中单声道语音信号频带扩展方法的基本流程图;Fig. 1 is the basic flow chart of the method for expanding the frequency band of a monophonic voice signal in the prior art;
图2是本发明方法实施例的立体声音频的带宽扩展方法的流程图;Fig. 2 is the flow chart of the bandwidth expansion method of stereo audio of the method embodiment of the present invention;
图3是本发明装置实施例的立体声音频的带宽扩展装置的结构示意图;3 is a schematic structural diagram of a bandwidth expansion device for stereo audio according to an apparatus embodiment of the present invention;
图4是本发明实例1的立体声音频的带宽扩展方法的原理框图;Fig. 4 is the principle block diagram of the bandwidth expansion method of the stereo audio of the example 1 of the present invention;
图5是本发明实例1中基于深度神经网络的状态空间模型的原理框图。FIG. 5 is a schematic block diagram of a state space model based on a deep neural network in Example 1 of the present invention.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.
为了解决现有技术中仅根据单个声道重建信号的主观质量实现对信号带宽的扩展,没有考虑到两个声道中信号能量和相位的相关性,其重建立体声信号严重影响了听者对声源位置和距离的判定的问题,本发明提供了一种立体声音频的带宽扩展方法与装置,以下结合附图以及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不限定本发明。In order to solve the problem of expanding the signal bandwidth only based on the subjective quality of the reconstructed signal of a single channel in the prior art, the correlation between the signal energy and phase in the two channels is not considered, and the reconstructed stereo signal seriously affects the listener's perception of the sound. Regarding the determination of source location and distance, the present invention provides a method and device for bandwidth expansion of stereo audio. The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to illustrate the present invention, but not to limit the present invention.
根据本发明的方法实施例,提供了一种立体声音频的带宽扩展方法,图1是本发明方法实施例的立体声音频的带宽扩展方法的流程图,如图1所示,根据本发明方法实施例的立体声音频的带宽扩展方法包括如下处理:According to the method embodiment of the present invention, a method for bandwidth expansion of stereo audio is provided. FIG. 1 is a flowchart of the method for bandwidth expansion of stereo audio according to the method embodiment of the present invention. As shown in FIG. 1 , according to the method embodiment of the present invention The bandwidth expansion method of stereo audio includes the following processing:
步骤201,将立体声信号分解为直达声和扩散声。Step 201, decompose the stereo signal into direct sound and diffuse sound.
具体的,步骤201包括以下步骤:Specifically, step 201 includes the following steps:
将所述立体声信号分解为左声道和右声道;decompose the stereo signal into left and right channels;
分别将分帧处理后的左声道和右声道进行时频变换,得到立体声信号的左声道短时频谱成分和右声道短时频谱成分;Perform time-frequency transformation on the left channel and right channel after frame division respectively, and obtain the left channel short-term spectral component and the right channel short-term spectral component of the stereo signal;
分别根据所述左声道短时频谱成分和右声道短时频谱成分,得到左右声道信号能量谱之间的和Psum、左右声道信号能量谱之间的差Pdiff、左右声道信号能量谱之间的互相关Pcc;According to the short-term spectral components of the left channel and the short-term spectral components of the right channel, respectively, the sum P sum between the left and right channel signal energy spectra, the difference P diff between the left and right channel signal energy spectra, the left and right channels the cross-correlation P cc between the signal energy spectra;
利用所述Psum、Pdiff、及Pcc通过最小二乘法得到直达声矩阵;Utilize described P sum , P diff , and P cc to obtain direct sound matrix by least squares method;
利用所述直达声矩阵从所述立体声信号中分离出直达声;Utilize the direct sound matrix to separate the direct sound from the stereo signal;
利用所述立体声信号减去所述直达声得到扩散声。The diffuse sound is obtained by subtracting the direct sound from the stereo signal.
更加具体的,所述分别根据所述左声道短时频谱成分和右声道短时频谱成分,得到左右声道信号能量谱之间的和Psum、左右声道信号能量谱之间的差Pdiff、左右声道信号能量谱之间的互相关Pcc、包括:More specifically, according to the short-term spectral components of the left channel and the short-term spectral components of the right channel, the sum P sum between the left and right channel signal energy spectra and the difference between the left and right channel signal energy spectra are obtained. P diff , the cross-correlation P cc between the left and right channel signal energy spectra, including:
利用所述左声道短时频谱成分SL(t,f)和所述右声道短时频谱成分SR(t,f)根据公式Psum=|SL(t,f)|2+|SR(t,f)|2计算左右声道信号能量谱之间的和Psum;Using the left channel short-term spectral component SL (t, f) and the right channel short-term spectral component SR (t, f) according to the formula P sum = | SL (t, f)| 2 + |S R (t,f)| 2 Calculate the sum P sum between the left and right channel signal energy spectra;
利用所述左声道短时频谱成分SL(t,f)和所述右声道短时频谱成分SR(t,f)根据公式Pdiff=|SL(t,f)|2-|SR(t,f)|2计算左右声道信号能量谱之间的差Pdiff;Using the left channel short-term spectral component SL (t,f) and the right channel short-term spectral component SR (t,f) according to the formula P diff =| SL (t,f)| 2 − |S R (t,f)| 2 Calculate the difference P diff between the left and right channel signal energy spectra;
利用所述左声道短时频谱成分SL(t,f)和所述右声道短时频谱成分SR(t,f)根据公式Pcc=R{SL(t,f)SR *(t,f)}计算左右声道信号能量谱之间的互相关Pcc,其中R{}为取实部操作。Using the left channel short-term spectral component SL (t, f) and the right channel short-term spectral component SR (t, f) according to the formula P cc =R{ SL (t,f)S R * (t,f)} Calculate the cross-correlation P cc between the left and right channel signal energy spectra, where R{} is the real part operation.
更加具体的,所述利用所述直达声矩阵从立体声信号中分离出直达声,包括:More specifically, the use of the direct sound matrix to separate the direct sound from the stereo signal includes:
利用所述直达声矩阵MD(t,f)根据公式1从立体声信号S(t,f)中分离出直达声S'(t,f);Use the direct sound matrix MD (t, f) to separate the direct sound S'(t, f) from the stereo signal S (t, f) according to formula 1;
S′(t,f)=MD(t,f)[SL(t,f)SR(t,f)]T公式1。S'(t, f) = M D (t, f) [ SL (t, f) S R (t, f)] T formula 1.
步骤202,按照预设的频带扩展方法对所述扩散声进行带宽扩展。Step 202, performing bandwidth expansion on the diffused sound according to a preset frequency band expansion method.
具体的,步骤202直接利用传统的频带扩展方法对所述扩散声进行带宽扩展,本发明不作赘述。Specifically, step 202 directly utilizes the traditional frequency band extension method to perform bandwidth extension on the diffused sound, which is not described in detail in the present invention.
步骤203,将所述直达声分离成多个不同方位的点声源,对多个点声源分别进行带宽扩展,得到带宽扩展后的多个点声源。Step 203 , separating the direct sound into a plurality of point sound sources with different azimuths, and performing bandwidth expansion on the plurality of point sound sources respectively to obtain a plurality of point sound sources with expanded bandwidths.
具体的,步骤203中将所述直达声分离成多个不同方位的点声源,包括:Specifically, in step 203, the direct sound is separated into a plurality of point sound sources with different orientations, including:
计算每一个时频点上直达声的方向信息,对全部时频点的方向信息进行聚类,得到方向信息的聚类中心,所述聚类中心分别对应各个点声源的方向信息;Calculate the direction information of the direct sound at each time-frequency point, and cluster the direction information of all the time-frequency points to obtain a cluster center of the direction information, and the cluster centers respectively correspond to the direction information of each point sound source;
根据某一时频点上直达声的方向信息和所述方向信息的聚类中心,得到掩蔽矩阵;Obtain a masking matrix according to the direction information of the direct sound at a certain time-frequency point and the cluster center of the direction information;
利用所述掩蔽矩阵对直达声进行分离,得到多个不同方位的点声源。The direct sound is separated by using the masking matrix to obtain a plurality of point sound sources in different directions.
具体的,所述对多个点声源分别进行带宽扩展,包括:Specifically, the bandwidth expansion of the multiple point sound sources includes:
将多个点声源分别输入到预设的状态空间模型中拟合窄带信号的短时频谱和宽带信号的短时频谱之间的映射关系,并根据预设的误差准则对宽带信号短时频谱高频成分的频谱包络进行估计,结合低频频谱包络和采用适当频谱修补方法扩展后的频谱细节,得到带宽扩展后的多个点声源。Input multiple point sound sources into the preset state space model to fit the mapping relationship between the short-term spectrum of the narrowband signal and the short-term spectrum of the wideband signal, and analyze the short-term spectrum of the wideband signal according to the preset error criterion. The spectral envelope of the high-frequency components is estimated, and combined with the low-frequency spectral envelope and the spectral details expanded by appropriate spectral patching methods, multiple point sound sources with expanded bandwidth are obtained.
更加具体的,所述在所述状态空间模型中拟合窄带信号的短时频谱和宽带信号的短时频谱之间的映射关系,并根据预设的误差准则对高频成分的频谱包络进行估计,包括:More specifically, in the state space model, the mapping relationship between the short-term spectrum of the narrowband signal and the short-term spectrum of the broadband signal is fitted, and the spectral envelope of the high-frequency component is performed according to a preset error criterion. Estimates, including:
利用前一时刻隐藏状态矢量和前一时刻窄带信号的短时频谱,得到状态空间模型中隐藏状态矢量;Using the hidden state vector at the previous moment and the short-term spectrum of the narrowband signal at the previous moment, the hidden state vector in the state space model is obtained;
利用所述状态空间模型中隐藏状态矢量和当前时刻窄带信号的短时频谱,得到宽带信号的短时频谱。Using the hidden state vector in the state space model and the short-term spectrum of the narrowband signal at the current moment, the short-term spectrum of the wideband signal is obtained.
步骤204,将所述多个带宽扩展后的点声源按照预设的方位信息进行重新混合,得到带宽扩展后的直达声,根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。Step 204, remixing the multiple bandwidth-expanded point sound sources according to the preset azimuth information to obtain the direct sound after the bandwidth expansion, and reconstructing the diffused sound after combining with the bandwidth-expanded direct sound according to the bandwidth expansion. output wideband stereo audio signal.
具体的,所述预先估计的方位信息根据所述方向信息的聚类中心估计得到,所述估计的方法为本领域的常规技术手段,本发明对此不作赘述。Specifically, the pre-estimated orientation information is estimated and obtained according to the cluster center of the orientation information, and the estimation method is a conventional technical means in the field, which is not described in detail in the present invention.
具体的,利用公式2根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号;Specifically, use formula 2 to reconstruct a wideband stereo audio signal according to the bandwidth-expanded direct sound combined with the bandwidth-expanded diffused sound;
在公式2中,表示宽带扩展后立体声信号的短时频谱;表示宽带扩展后直达声的短时频谱;表示带宽扩展后扩散声的短时频谱。In Equation 2, Represents the short-term spectrum of the wideband extended stereo signal; Represents the short-term spectrum of the direct sound after broadband expansion; Represents the short-term spectrum of the diffuse sound after bandwidth expansion.
与本发明的方法实施例相对应,提供了一种立体声音频的带宽扩展装置,图3是本发明装置实施例的立体声音频的带宽扩展装置的结构示意图,如图3所示,根据本发明装置实施例的立体声音频的带宽扩展装置包括:分解模块30、扩散声扩展模块32、直达声分离与扩展模块34、重构模块36,以下对本发明实施例的各个模块进行详细的说明。Corresponding to the method embodiment of the present invention, a device for bandwidth expansion of stereo audio is provided. FIG. 3 is a schematic structural diagram of the device for bandwidth expansion of stereo audio according to an embodiment of the device of the present invention. As shown in FIG. The stereo audio bandwidth expansion device of the embodiment includes: a
具体地,所述分解模块30,用于将立体声信号分解为直达声和扩散声;Specifically, the
所述扩散声扩展模块32,用于按照预设的频带扩展方法对所述扩散声进行带宽扩展;The diffused
所述直达声分离与扩展模块34,用于将所述直达声分离成多个不同方位的点声源,对多个点声源分别进行带宽扩展,得到带宽扩展后的多个点声源;The direct sound separation and
所述重构模块36,用于将所述带宽扩展后的多个点声源按照预先估计的方位信息进行重新混合,得到带宽扩展后的直达声,用于根据所述带宽扩展后的直达声结合带宽扩展后的扩散声重建出宽带立体声音频信号。The
所述分解模块30具体用于:The
将所述立体声信号分解为左声道和右声道;decompose the stereo signal into left and right channels;
分别将分帧处理后的左声道和右声道进行时频变换,得到立体声信号的左声道短时频谱成分和右声道短时频谱成分;Perform time-frequency transformation on the left channel and right channel after frame division respectively, and obtain the left channel short-term spectral component and the right channel short-term spectral component of the stereo signal;
分别根据所述左声道短时频谱成分和右声道短时频谱成分,得到左右声道信号能量谱之间的和Psum、左右声道信号能量谱之间的差Pdiff、左右声道信号能量谱之间的互相关Pcc;According to the short-term spectral components of the left channel and the short-term spectral components of the right channel, respectively, the sum P sum between the left and right channel signal energy spectra, the difference P diff between the left and right channel signal energy spectra, the left and right channels the cross-correlation P cc between the signal energy spectra;
利用所述Psum、Pdiff、及Pcc通过最小二乘法得到直达声矩阵;Utilize described P sum , P diff , and P cc to obtain direct sound matrix by least squares method;
利用所述直达声矩阵从所述立体声信号中分离出直达声;Utilize the direct sound matrix to separate the direct sound from the stereo signal;
利用所述立体声信号减去所述直达声得到扩散声。The diffuse sound is obtained by subtracting the direct sound from the stereo signal.
所述直达声分离与扩展模块34具体用于:The direct sound separation and
计算每一个时频点上直达声的方向信息,对全部时频点的方向信息进行聚类,得到方向信息的聚类中心,所述聚类中心分别对应各个点声源的方向信息;Calculate the direction information of the direct sound at each time-frequency point, and cluster the direction information of all the time-frequency points to obtain a cluster center of the direction information, and the cluster centers respectively correspond to the direction information of each point sound source;
根据某一时频点上直达声的方向信息和所述方向信息的聚类中心,得到掩蔽矩阵;Obtain a masking matrix according to the direction information of the direct sound at a certain time-frequency point and the cluster center of the direction information;
利用所述掩蔽矩阵对直达声进行分离,得到多个不同方位的点声源。The direct sound is separated by using the masking matrix to obtain a plurality of point sound sources in different directions.
所述直达声分离与扩展模块34具体用于:The direct sound separation and
将多个点声源分别输入到预设的状态空间模型中拟合窄带信号的短时频谱和宽带信号的短时频谱之间的映射关系,并根据预设的误差准则对宽带信号短时频谱高频成分的频谱包络进行估计,结合低频频谱包络和采用适当频谱修补方法扩展后的频谱细节,最终得到带宽扩展后的直达声。Input multiple point sound sources into the preset state space model to fit the mapping relationship between the short-term spectrum of the narrowband signal and the short-term spectrum of the wideband signal, and analyze the short-term spectrum of the wideband signal according to the preset error criterion. The spectral envelope of high-frequency components is estimated, combined with the low-frequency spectral envelope and the spectral details expanded by appropriate spectral patching methods, and finally the direct sound after bandwidth expansion is obtained.
为了更加详细的说明本发明的技术方案,给出实例1,图4是本发明实例1的立体声音频的带宽扩展方法的原理框图,如图4所示,一种立体声音频的带宽扩展方法包括以下步骤:In order to describe the technical solution of the present invention in more detail, Example 1 is given. FIG. 4 is a schematic block diagram of the bandwidth expansion method of stereo audio according to Example 1 of the present invention. As shown in FIG. 4 , a bandwidth expansion method of stereo audio includes the following step:
1.直达声/扩散声分离1. Direct sound/diffuse sound separation
本文所提出的立体声扩展系统采用离散傅里叶变换或者正交镜像滤波器组将分帧后的左右声道音频信号各自转换到频域,并根据人耳听觉感知原理划分为多个均匀子带或临界频带。那么,输入立体声信号的短时频谱S(t,f)可以表示为S(t,f)=[SL(t,f)SR(t,f)]T The stereo expansion system proposed in this paper uses discrete Fourier transform or orthogonal mirror filter bank to convert the framed left and right channel audio signals to the frequency domain, and divides them into multiple uniform sub-bands according to the principle of human auditory perception or critical band. Then, the short-term spectrum S(t,f) of the input stereo signal can be expressed as S(t,f)=[ SL (t,f)S R (t,f)] T
其中,t和f分别表示信号的时间帧和子带序号;SL(t,f)和SR(t,f)则分别表示立体声信号的左右声道短时频谱成分。Among them, t and f represent the time frame and subband number of the signal, respectively; SL (t, f) and SR (t, f) represent the short-term spectral components of the left and right channels of the stereo signal, respectively.
为了有效地分离直达声和扩散声,系统还需要分别计算左右声道信号能量谱之间的和Psum和差Pdiff以及两个声道的互相关Pcc。In order to effectively separate the direct sound and the diffuse sound, the system also needs to calculate the sum P sum and the difference P diff between the left and right channel signal energy spectra and the cross-correlation P cc of the two channels, respectively.
Psum=|SL(t,f)|2+|SR(t,f)|2 P sum = | SL (t,f)| 2 +|S R (t,f)| 2
Pdiff=|SL(t,f)|2-|SR(t,f)|2 P diff =|S L (t,f)| 2 -|S R (t,f)| 2
Pcc=R{SL(t,f)SR *(t,f)}P cc =R{S L (t,f)S R * (t,f)}
其中,R{}为取实部操作。为了改善分离算法的稳定性,分别对计算得到的Psum、Pdiff和Pcc进行时间平滑。Among them, R{} is the operation of taking the real part. In order to improve the stability of the separation algorithm, time smoothing is performed on the calculated P sum , P diff and P cc respectively.
立体声左右声道中的直达声成分之间高度相关,并可表示为由某一方向传播来的点声源信号。据此,本文所提系统利用一个直达声矩阵从原始立体声双声道信号S(t,f)中直接分离出直达声成分S'(t,f),如下式所示,The direct sound components in the stereo left and right channels are highly correlated and can be represented as point sound source signals propagating from a certain direction. Accordingly, the system proposed in this paper uses a direct sound matrix to directly separate the direct sound component S'(t, f) from the original stereo binaural signal S(t, f), as shown in the following formula:
S'(t,f)=[SL'(t,f)SR'(t,f)]T=MD(t,f)[SL(t,f)SR(t,f)]T=MD(t,f)S(t,f)S'(t,f)=[S L '(t,f)S R '(t,f)] T =MD (t,f)[ S L (t,f)S R (t,f) ] T = M D (t,f)S(t,f)
其中,SL'(t,f)和SR'(t,f)分别表示直达声的左右声道短时频谱成分,MD(t,f)为直达声矩阵。根据文献【M Vinton,D McGrath,C Robinson,P Brown,next generationsurround decoding and upmixing for consumer ad professional applications.AES57th International conference,USA,2015】所述,直达声矩阵MD(t,f)可以利用最小二乘方法获得,从而使得估计得到的直达声成分和真实成分之间的期望平方误差最小,即Among them, SL '(t, f) and SR '(t, f) represent the short-term spectral components of the left and right channels of the direct sound, respectively, and MD (t, f) is the direct sound matrix. According to the literature [M Vinton, D McGrath, C Robinson, P Brown, next generationsurround decoding and upmixing for consumer ad professional applications. AES57 th International conference, USA, 2015], the direct sound matrix M D (t, f) can be used The least squares method is obtained, so that the expected square error between the estimated direct sound component and the real component is minimized, namely
则直达声矩阵MD(t,f)可以由下式计算得到,Then the direct sound matrix M D (t, f) can be calculated by the following formula,
而扩散声成分S”(t,f)则可以表示为原始立体声信号和直达声成分之差,The diffuse sound component S”(t,f) can be expressed as the difference between the original stereo signal and the direct sound component,
S”(t,f)=S(t,f)-S'(t,f)S"(t,f)=S(t,f)-S'(t,f)
2.直达声成分的声源分离2. Sound source separation of direct sound components
根据S'(t,f)=[SL'(t,f)SR'(t,f)]T利用公式3得到某一时频点上直达声S'(t,f)的方向信息θ(t,f),所述某一时频点上直达声S'(t,f)的方向信息θ(t,f)与点声源的方向信息θi相同;According to S'(t,f)=[S L '(t,f)S R '(t,f)] T , the direction information θ of the direct sound S'(t,f) at a certain time-frequency point is obtained by formula 3 (t, f), the direction information θ(t, f) of the direct sound S'(t, f) at a certain time-frequency point is the same as the direction information θ i of the point sound source;
对全部时频点的方向信息θ(t,f)进行聚类,得到方向信息的聚类中心Ci,i=1、2…N;这些聚类中心分别对应各个点声源S1(t,f)、S2(t,f)、S3(t,f)…SN(t,f)的方向信息θ1、θ2、θ3…θN;The direction information θ(t, f) of all time-frequency points is clustered to obtain the cluster center C i of the direction information, i=1, 2...N; these cluster centers correspond to each point sound source S 1 (t ,f), S 2 (t,f), S 3 (t,f)…S N (t,f) direction information θ 1 , θ 2 , θ 3 …θ N ;
根据某一时频点上直达声S'(t,f)的方向信息θ(t,f)和聚类中心Ci得到掩蔽矩阵mi(t,f);According to the direction information θ(t, f) of the direct sound S'(t, f) at a certain time-frequency point and the cluster center C i , the masking matrix m i (t, f) is obtained;
利用所述掩蔽矩阵mi(t,f)根据公式4对直达声S'(t,f)进行分离,得到直达点声源 Use the masking matrix m i (t, f) to separate the direct sound S'(t, f) according to formula 4, and obtain the direct point sound source
3.带宽扩展3. Bandwidth expansion
根据上文所述方法,分别从立体声信号中分离出扩散声成分S”(t,f)和直达声成分S'(t,f),并利用时频稀疏性进一步将直达声成分S'(t,f)分离成多个点声源接下来可以根据单声道频带扩展方法分别对扩散声S”(t,f)和直达点声源进行独立的带宽扩展。According to the method described above, the diffuse sound component S"(t,f) and the direct sound component S'(t,f) are separated from the stereo signal respectively, and the direct sound component S'( t,f) separated into multiple point sources Next, the diffuse sound S”(t,f) and the direct point sound source can be separately analyzed according to the monophonic frequency band extension method. Perform independent bandwidth expansion.
本文采用状态空间模型来直接拟合窄宽带频谱参数之间的映射关系,并在实际扩展中根据一定的误差准则对高频成分的频谱包络进行估计,In this paper, the state space model is used to directly fit the mapping relationship between the narrow-band spectral parameters, and in the actual expansion, the spectral envelope of the high-frequency components is estimated according to a certain error criterion.
SY(t,f)=F[SX(t,f)]S Y (t,f)=F[S X (t,f)]
式中,SX(t,f)和SY(t,f)分别表示窄带和宽带信号的短时频谱,F[]表示映射(或估计)函数。In the formula, S X (t, f) and S Y (t, f) represent the short-term spectrum of the narrowband and wideband signals, respectively, and F[] represents the mapping (or estimation) function.
根据状态空间模型,映射函数F[]可以由状态演变函数Fstate[]和观察函数Fobs[]两个过程来描述,如下式所示,According to the state space model, the mapping function F[] can be described by two processes, the state evolution function F state [] and the observation function F obs [], as shown in the following formula,
Shidden(t,f)=Fstate[Shidden(t-1,f),SX(t-1,f),N1(t,f)]S hidden (t,f)=F state [S hidden (t-1,f),S X (t-1,f),N 1 (t,f)]
SY(t,f)=Fobs[Shidden(t,f),SX(t,f),N2(t,f)]S Y (t, f) = F obs [S hidden (t, f), S X (t, f), N 2 (t, f)]
其中,Shidden(t,f)为模型中隐藏状态矢量,N1(t,f)和N2(t,f)分别描述状态演变函数Fstate和观测函数Fobs的误差。上述模型中,当前时刻的隐藏状态矢量Shidden(t,f)由前一时刻隐藏状态矢量Shidden(t-1,f)和前一时刻窄带信号的短时频谱SX(t-1,f)所决定,而当前时刻宽带信号短时频谱SY(t,f)则进一步由当前时刻隐藏状态矢量Shidden(t,f)和当前时刻窄带信号的短时频谱SX(t,f)决定。利用状态空间模型中蕴含的隐藏状态递归结构能够更加精确地拟合窄宽带频谱参数之间的复杂映射关系,该模型可以采用广义卡尔曼滤波方法实现,也可以采用两个相互独立的深度神经网络来实现。基于深度神经网络的状态空间模型基本原理如图5所示。此处,状态演变函数Fstate和观测函数Fobs可以采用堆栈自编码器、多层感知器、延时递归网络、长短时记忆网络等各种前向和递归深度神经网络实现。Among them, S hidden (t, f) is the hidden state vector in the model, and N 1 (t, f) and N 2 (t, f) describe the error of the state evolution function F state and the observation function F obs , respectively. In the above model, the hidden state vector S hidden (t, f) at the current moment is composed of the hidden state vector S hidden (t-1, f) at the previous moment and the short-term spectrum S X (t-1, f) of the narrowband signal at the previous moment. f), and the short-term spectrum S Y (t, f) of the broadband signal at the current moment is further determined by the hidden state vector S hidden (t, f) at the current moment and the short-term spectrum S X (t, f) of the narrowband signal at the current moment )Decide. Using the hidden state recursive structure contained in the state space model can more accurately fit the complex mapping relationship between narrow-band spectrum parameters. The model can be implemented by the generalized Kalman filter method, or two independent deep neural networks can be used. to fulfill. The basic principle of state space model based on deep neural network is shown in Figure 5. Here, the state evolution function F state and the observation function F obs can be implemented by various forward and recursive deep neural networks such as stacked autoencoders, multilayer perceptrons, delayed recurrent networks, and long and short-term memory networks.
4.立体声信号合成4. Stereo signal synthesis
采用单声道频带扩展方法可以分别对扩散声S”(t,f)和直达点声源2,…,N进行扩展,从而得到相应的宽带频谱SY(t,f)。接下来,可以利用各个点声源方向信息θi来重现宽带直达声 The monophonic frequency band extension method can be used for diffuse sound S”(t,f) and direct point sound source respectively. 2,...,N are expanded to obtain the corresponding broadband spectrum S Y (t, f). Next, the broadband direct sound can be reproduced using the direction information θ i of each point sound source
其中,为扩展后的点声源宽带频谱。为扩展后宽带直达声的短时频谱。最终,结合扩展后的宽带扩散声可以实现宽带立体声信号的重现,in, is the expanded wideband spectrum of the point sound source. is the short-term spectrum of the extended broadband direct sound. Finally, combined with the extended broadband diffuse sound Wideband stereo signal can be realized reappearance,
本发明实施例首先利用声道间的频谱相关性将输入立体声信号分解为直达声和扩散声两种成分,然后扩散声成分直接利用传统频带扩展方法进行扩展;直达声则依据不同声源在时频结构上的稀疏性分离成多个不同方位的点声源,并分别进行带宽扩展,最终扩展后的点声源依照其在原始立体声中方位信息进行重新混合,并结合带宽扩展后的扩散声成分,重建出宽带立体声音频信号。本发明解决了现有技术中仅根据单个声道重建信号的主观质量实现对信号带宽的扩展,没有考虑到两个声道中信号能量和相位的相关性,其重建立体声信号严重影响了听者对声源位置和距离的判定的问题。In the embodiment of the present invention, the input stereo signal is firstly decomposed into two components: direct sound and diffused sound by using the spectral correlation between channels, and then the diffused sound component is directly expanded by using the traditional frequency band expansion method; The sparseness of the frequency structure is separated into multiple point sound sources with different orientations, and the bandwidths are expanded respectively. Finally, the expanded point sound sources are remixed according to their orientation information in the original stereo, and combined with the diffused sound after bandwidth expansion. components to reconstruct a wideband stereo audio signal. The invention solves the problem that in the prior art, the expansion of the signal bandwidth is only realized according to the subjective quality of the reconstructed signal of a single channel, and the correlation between the signal energy and the phase in the two channels is not considered, and the reconstructed stereo signal seriously affects the listener. The problem of determining the position and distance of the sound source.
以上所述仅为本发明的实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的权利要求范围之内。The above description is only an embodiment of the present invention, and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the scope of the claims of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710203054.1A CN106960672B (en) | 2017-03-30 | 2017-03-30 | Bandwidth extension method and device for stereo audio |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710203054.1A CN106960672B (en) | 2017-03-30 | 2017-03-30 | Bandwidth extension method and device for stereo audio |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN106960672A CN106960672A (en) | 2017-07-18 |
| CN106960672B true CN106960672B (en) | 2020-08-21 |
Family
ID=59470575
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710203054.1A Expired - Fee Related CN106960672B (en) | 2017-03-30 | 2017-03-30 | Bandwidth extension method and device for stereo audio |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106960672B (en) |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107886966A (en) * | 2017-10-30 | 2018-04-06 | 捷开通讯(深圳)有限公司 | Terminal and its method for optimization voice command, storage device |
| CN108152788A (en) * | 2017-12-22 | 2018-06-12 | 西安Tcl软件开发有限公司 | Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium |
| CN109975762B (en) * | 2017-12-28 | 2021-05-18 | 中国科学院声学研究所 | An underwater sound source localization method |
| CN110751956B (en) * | 2019-09-17 | 2022-04-26 | 北京时代拓灵科技有限公司 | Immersive audio rendering method and system |
| JP7605118B2 (en) * | 2019-09-24 | 2024-12-24 | ソニーグループ株式会社 | Signal processing device, signal processing method and program |
| CN116193350A (en) * | 2021-11-29 | 2023-05-30 | 广州视源电子科技股份有限公司 | Audio signal processing method, device, device and storage medium |
| CN116261086B (en) * | 2022-09-09 | 2026-01-20 | 深圳市中科蓝讯科技股份有限公司 | Sound signal processing method, device, equipment and storage medium |
Family Cites Families (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5222059A (en) * | 1988-01-06 | 1993-06-22 | Lucasfilm Ltd. | Surround-sound system with motion picture soundtrack timbre correction, surround sound channel timbre correction, defined loudspeaker directionality, and reduced comb-filter effects |
| KR101435893B1 (en) * | 2006-09-22 | 2014-09-02 | 삼성전자주식회사 | METHOD AND APPARATUS FOR ENCODING / DECODING AUDIO SIGNAL USING BANDWIDTH EXTENSION METHOD AND Stereo Coding |
| CN102859590B (en) * | 2010-02-24 | 2015-08-19 | 弗劳恩霍夫应用研究促进协会 | Device for generating an enhanced down-mixing signal, method for generating an enhanced down-mixing signal, and computer program |
| CN102572676B (en) * | 2012-01-16 | 2016-04-13 | 华南理工大学 | A kind of real-time rendering method for virtual auditory environment |
| EP2645748A1 (en) * | 2012-03-28 | 2013-10-02 | Thomson Licensing | Method and apparatus for decoding stereo loudspeaker signals from a higher-order Ambisonics audio signal |
| CN104781880B (en) * | 2012-09-03 | 2017-11-28 | 弗劳恩霍夫应用研究促进协会 | The apparatus and method that multi channel speech for providing notice has probability Estimation |
| EP2733964A1 (en) * | 2012-11-15 | 2014-05-21 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Segment-wise adjustment of spatial audio signal to different playback loudspeaker setup |
| EP2782094A1 (en) * | 2013-03-22 | 2014-09-24 | Thomson Licensing | Method and apparatus for enhancing directivity of a 1st order Ambisonics signal |
| EP2884491A1 (en) * | 2013-12-11 | 2015-06-17 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Extraction of reverberant sound using microphone arrays |
| CN106531179B (en) * | 2015-09-10 | 2019-08-20 | 中国科学院声学研究所 | A Multi-Channel Speech Enhancement Method with Semantic Prior Based Selective Attention |
-
2017
- 2017-03-30 CN CN201710203054.1A patent/CN106960672B/en not_active Expired - Fee Related
Also Published As
| Publication number | Publication date |
|---|---|
| CN106960672A (en) | 2017-07-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106960672B (en) | Bandwidth extension method and device for stereo audio | |
| US11664034B2 (en) | Optimized coding and decoding of spatialization information for the parametric coding and decoding of a multichannel audio signal | |
| EP2965540B1 (en) | Apparatus and method for multichannel direct-ambient decomposition for audio signal processing | |
| JP4963962B2 (en) | Multi-channel signal encoding apparatus and multi-channel signal decoding apparatus | |
| JP4832305B2 (en) | Stereo signal generating apparatus and stereo signal generating method | |
| CN102157156B (en) | Single-channel voice enhancement method and system | |
| EP1829424A1 (en) | Temporal envelope shaping of decorrelated signal | |
| WO2007028250A2 (en) | Method and device for binaural signal enhancement | |
| CN103038823B (en) | Systems and methods for speech extraction | |
| KR20120090086A (en) | Determining an upperband signal from a narrowband signal | |
| EP2612322A1 (en) | Method and apparatus for encoding/decoding multichannel audio signal | |
| AU2012280392B2 (en) | Method and apparatus for decomposing a stereo recording using frequency-domain processing employing a spectral weights generator | |
| US10412226B2 (en) | Audio signal processing apparatus and method | |
| US12069466B2 (en) | Systems and methods for audio upmixing | |
| Roman et al. | Pitch-based monaural segregation of reverberant speech | |
| CN119229875B (en) | Target voice extraction method and device based on multi-reference clue fusion | |
| Westhausen et al. | Reduction of subjective listening effort for TV broadcast signals with recurrent neural networks | |
| Hussain et al. | Towards intelligibility-oriented audio-visual speech enhancement | |
| Gaultier et al. | Recovering speech intelligibility with deep learning and multiple microphones in noisy-reverberant situations for people using cochlear implants | |
| Kallel et al. | A noise cross PSD estimator based on improved minimum statistics method for two-microphone speech enhancement dedicated to a bilateral cochlear implant | |
| WO2017202680A1 (en) | Method and apparatus for voice or sound activity detection for spatial audio | |
| Subramanya et al. | A graphical model for multi-sensory speech processing in air-and-bone conductive microphones | |
| Hsu et al. | Spectro-temporal subband wiener filter for speech enhancement | |
| CN115116465A (en) | Sound source separation method and sound source separation device | |
| Talagala et al. | Binaural localization of speech sources in the median plane using cepstral HRTF extraction |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200821 Termination date: 20210330 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |



















