HK1227164B

HK1227164B - High band excitation signal generation

Info

Publication number: HK1227164B
Application number: HK17100481.7A
Authority: HK
Inventors: 库马尔拉马达斯普拉文; J 辛德尔丹尼尔; 皮埃尔维莱特斯特凡那; 拉金德朗维韦克
Original assignee: 高通股份有限公司
Priority date: 2014-04-30
Filing date: 2015-03-31
Publication date: 2020-10-23

Description

High-band excitation signal generation

优先权声明Priority Declaration

本申请案请求2014年4月30日申请的标题为“HIGH BAND EXCITATION SIGNALGENERATION”的美国申请案第14/265,693号的优先权，所述美国申请案的内容以全文引用的方式合并。This application claims priority to U.S. Application No. 14/265,693, filed April 30, 2014, entitled “HIGH BAND EXCITATION SIGNALGENERATION,” the contents of which are incorporated by reference in their entirety.

技术领域Technical Field

本发明通常涉及高频带激励信号生成。The present invention generally relates to high-band excitation signal generation.

背景技术Background Art

技术的进步已带来更小且更强大的计算装置。举例来说，当前存在多种便携式个人计算装置，包含无线计算装置，例如便携式无线电话、个人数字助理(PDA)及传呼装置，其体积小，重量轻，且易于用户携带。更具体地，便携式无线电话(例如，蜂窝式电话及因特网协议(IP)电话)可经由无线网络传达语音及数据报。另外，许多这些无线电话包含合并到其中的其它类型的装置。举例来说，无线电话也可包含数字静物照相机、数字摄影机、数字记录器及音频文件播放器。Advances in technology have led to smaller and more powerful computing devices. For example, there are currently a variety of portable personal computing devices, including wireless computing devices, such as portable wireless telephones, personal digital assistants (PDAs), and paging devices, that are small, lightweight, and easily carried by users. More specifically, portable wireless telephones (e.g., cellular telephones and Internet Protocol (IP) telephones) can communicate voice and data packets over wireless networks. In addition, many of these wireless telephones include other types of devices incorporated therein. For example, a wireless telephone may also include a digital still camera, a digital video camera, a digital recorder, and an audio file player.

由数字技术发射语音是普遍的，尤其在长距离及数字无线电电话应用中。如果通过采样及数字化发射话音，则大约为六十四千位/秒(kbps)的数据速率可用于达成模拟电话的话音质量。压缩技术可用于减小经由信道发送的信息量，同时维持重新构建的话音的感知质量。经由在接收器处使用话音分析，接着译码、发射及重新合成，可达成数据速率的显著减小。Transmitting speech by digital technology is common, especially in long-distance and digital radiotelephony applications. If the speech is transmitted by sampling and digitizing, a data rate of approximately 64 kilobits per second (kbps) can be used to achieve the speech quality of analog telephony. Compression techniques can be used to reduce the amount of information sent over the channel while maintaining the perceived quality of the reconstructed speech. By using speech analysis at the receiver, followed by decoding, transmission, and resynthesis, a significant reduction in data rate can be achieved.

用于压缩话音的装置可用于许多电信领域中。举例来说，无线通信具有许多应用，包含(例如)无绳电话、传呼、无线本地回路、无线电话(例如，蜂窝式及个人通信服务(PCS)电话系统)、移动因特网协议(IP)电话及卫星通信系统。特定应用为用于移动用户的无线电话。Devices for compressing speech can be used in many areas of telecommunications. For example, wireless communications have many applications, including, for example, cordless telephony, paging, wireless local loops, wireless telephony (e.g., cellular and Personal Communications Service (PCS) telephone systems), mobile Internet Protocol (IP) telephony, and satellite communication systems. A particular application is wireless telephony for mobile users.

已开发用于无线通信系统的各种空中接口，包含(例如)频分多址(FDMA)、时分多址(TDMA)、码分多址(CDMA)及时分同步CDMA(TD-SCDMA)。结合所述空中接口，已建立各种国内及国际标准，包含(例如)先进移动电话服务(AMPS)、全球移动通信系统(GSM)及临时标准95(IS-95)。示范性无线电话通信系统为码分多址(CDMA)系统。IS-95标准及其衍生标准(IS-95A、ANSI J-STD-008及IS-95B)(本文中统称作IS-95)由电信工业协会(TIA)及其它公认标准机构颁布以指定CDMA空中接口针对蜂窝式或PCS电话通信系统的使用。Various air interfaces have been developed for wireless communication systems, including, for example, frequency division multiple access (FDMA), time division multiple access (TDMA), code division multiple access (CDMA), and time division synchronous CDMA (TD-SCDMA). Various domestic and international standards have been established in conjunction with these air interfaces, including, for example, Advanced Mobile Phone Service (AMPS), Global System for Mobile Communications (GSM), and Interim Standard 95 (IS-95). An exemplary wireless telephone communication system is a code division multiple access (CDMA) system. The IS-95 standard and its derivatives (IS-95A, ANSI J-STD-008, and IS-95B) (collectively referred to herein as IS-95) were promulgated by the Telecommunications Industry Association (TIA) and other recognized standards bodies to specify the use of the CDMA air interface for cellular or PCS telephone communication systems.

IS-95标准随后演进成例如cdma2000及WCDMA的“3G”系统，所述“3G”系统提供更大容量及高速度分组数据服务。cdma2000的两个变化由TIA发布的文件IS-2000(cdma20001xRTT)及IS-856(cdma2000 1xEV-DO)呈现。cdma2000 1xRTT通信系统提供153kbps的波峰数据速率，而cdma2000 1xEV-DO通信系统定义范围介于38.4kbps至2.4Mbps的数据速率集合。WCDMA标准体现于第三代合作伙伴计划“3GPP”第3G TS25.211号、第3G TS 25.212号、第3G TS 25.213号及第3G TS 25.214号中。先进国际移动电信(先进IMT)规范陈述“4G”标准。对于高移动性通信(例如，来自火车及汽车)，先进IMT规范设定100兆位/秒(Mbit/s)的波峰数据速率用于4G服务，且对于低移动性通信(例如，来自行人及静止用户)，先进IMT规范设定千兆位/秒(Gbit/s)的波峰数据速率。The IS-95 standard subsequently evolved into "3G" systems such as cdma2000 and WCDMA, which provide higher capacity and high-speed packet data services. Two variations of cdma2000 are represented by documents IS-2000 (cdma2000 1xRTT) and IS-856 (cdma2000 1xEV-DO), published by the International Telecommunication Union (TIA). The cdma2000 1xRTT communication system provides a peak data rate of 153 kbps, while the cdma2000 1xEV-DO communication system defines a set of data rates ranging from 38.4 kbps to 2.4 Mbps. The WCDMA standard is embodied in the Third Generation Partnership Project (3GPP) specifications 3G TS 25.211, 3G TS 25.212, 3G TS 25.213, and 3G TS 25.214. The International Mobile Telecommunications (IMT) Advanced specifications set forth the "4G" standard. For high-mobility communications (e.g., from trains and cars), the IMT Advanced specifications set a peak data rate of 100 megabits per second (Mbit/s) for 4G services, and for low-mobility communications (e.g., from pedestrians and stationary users), the IMT Advanced specifications set a peak data rate of gigabits per second (Gbit/s).

使用通过提取关于人类话音生成模型的参数来压缩话音的技术的装置被称为话音译码器。话音译码器可包括编码器及解码器。编码器将传入话音信号划分成时间块或分析帧。可将每一时间分段(或“帧”)的持续时间选择为足够短的，使得可预期信号的频谱包络保持相对静止。举例来说，帧长度可为二十毫秒，其对应于八千赫兹(kHz)采样率下的160个样本，但可使用认为适于特定应用的任何帧长度或采样率。Devices that use techniques for compressing speech by extracting parameters related to a generative model of human speech are called speech decoders. A speech decoder may include an encoder and a decoder. The encoder divides the incoming speech signal into time blocks, or analysis frames. The duration of each time segment (or "frame") may be chosen to be short enough so that the spectral envelope of the signal can be expected to remain relatively static. For example, the frame length may be 20 milliseconds, which corresponds to 160 samples at an 8 kilohertz (kHz) sampling rate, but any frame length or sampling rate deemed appropriate for a particular application may be used.

编码器分析传入话音帧以提取某些相关参数，且随后将参数量化成二进制表示(例如，位集合或二进制数据包)。将数据包经由通信信道(即，有线和/或无线网络连接)发射至接收器及解码器。解码器处理数据包、去量化经处理数据包以产生参数且使用经去量化的参数重新合成话音帧。The encoder analyzes the incoming speech frames to extract certain relevant parameters and then quantizes the parameters into a binary representation (e.g., a set of bits or binary data packets). The data packets are transmitted via a communication channel (i.e., a wired and/or wireless network connection) to a receiver and decoder. The decoder processes the data packets, dequantizes the processed data packets to generate parameters, and resynthesizes the speech frames using the dequantized parameters.

话音译码器的功能为通过去除话音中固有的自然冗余而将经数字化话音信号压缩成低位率信号。可通过用参数集合表示输入话音频框及使用量化以通过位集合表示参数来达成数字压缩。如果输入话音帧具有位计数N_i，且由话音译码器所产生的数据包具有位计数N_o，则由话音译码器所达成的压缩因数为C_r＝N_i/N_o。挑战为在达成目标压缩因数时保留经解码话音的高语音质量。话音译码器的性能取决于：(1)话音模型或上文所描述的分析及合成过程的组合执行得多好，及(2)在N_o位每帧的目标位率下参数量化过程执行得多好。因此，话音模型的目标为对于每一帧使用较小集合的参数捕获话音信号的本质或目标语音质量。The function of a speech decoder is to compress a digitized speech signal into a low bit rate signal by removing the natural redundancy inherent in speech. Digital compression can be achieved by representing the input speech frame with a set of parameters and using quantization to represent the parameters with a set of bits. If the input speech frame has a bit count _Ni and the data packet produced by the speech decoder has a bit count _No , then the compression factor achieved by the speech decoder is _Cr = _Ni / _No . The challenge is to preserve high speech quality of the decoded speech while achieving the target compression factor. The performance of the speech decoder depends on: (1) how well the speech model or the combination of the analysis and synthesis processes described above performs, and (2) how well the parameter quantization process performs at the target bit rate of _{No bits} per frame. Therefore, the goal of the speech model is to capture the essence of the speech signal or the target speech quality using a small set of parameters for each frame.

话音译码器通常利用参数集合(包含向量)来描述话音信号。良好参数集合为感知上准确的话音信号的重新构建理想地提供低系统带宽。音调、信号功率、频谱包络(或共振峰)、振幅及相谱为话音译码参数的实例。Speech decoders typically use a set of parameters (including vectors) to describe speech signals. A good parameter set ideally provides low system bandwidth for perceptually accurate reconstruction of speech signals. Pitch, signal power, spectral envelope (or formants), amplitude, and phase spectrum are examples of speech decoding parameters.

话音译码器可实施为时域译码器，其试图通过使用高时间分辨率处理以一次编码较小话音分段(例如，5毫秒(ms)的子帧)来捕获时域话音波形。对于每一子帧，借助于搜索算法找到来自码簿空间的高精确度代表。替代地，话音译码器可实施为频域译码器，其试图通过参数集合(分析)捕获输入话音帧的短期话音频谱及使用对应的合成过程以从频谱参数重新产生话音波形。参数量化器通过根据已知量化技术用所存储的码向量的表示来表示参数而保持参数。The speech decoder can be implemented as a time-domain decoder, which attempts to capture the time-domain speech waveform by encoding small speech segments (e.g., 5 millisecond (ms) subframes) at a time using high-time resolution processing. For each subframe, a high-precision representation from the codebook space is found with the help of a search algorithm. Alternatively, the speech decoder can be implemented as a frequency-domain decoder, which attempts to capture the short-term speech spectrum of the input speech frame through parameter set (analysis) and using a corresponding synthesis process to regenerate the speech waveform from the spectral parameters. The parameter quantizer maintains the parameters by representing them with stored representations of code vectors according to known quantization techniques.

一个时域话音译码器为码激发线性预测(CELP)译码器。在CELP译码器中，通过发现短期共振峰滤波器的系数的线性预测(LP)分析来去除话音信号中的短期相关性或冗余。将短期预测滤波器应用于传入话音帧生成LP残余信号，通过长期预测滤波器参数及后续随机码簿对所述LP残余信号进行进一步模型化及量化。因此，CELP译码将编码时域话音波形的任务划分成编码LP短期滤波器系数及编码LP残余的单独任务。可以固定速率(即，对于每一帧，使用相同数目个位N_o)或可变速率(其中，不同位率用于不同类型的帧内容)执行时域译码。可变速率译码器试图使用将参数编码至足以获得目标质量的等级所需要的位量。One time-domain speech decoder is a code-excited linear prediction (CELP) decoder. In a CELP decoder, short-term correlations or redundancies in the speech signal are removed by linear prediction (LP) analysis, which finds the coefficients of a short-term formant filter. The short-term prediction filter is applied to the incoming speech frame to generate an LP residual signal, which is further modeled and quantized using the long-term prediction filter parameters and a subsequent random codebook. CELP decoding thus divides the task of encoding the time-domain speech waveform into the separate tasks of encoding the LP short-term filter coefficients and encoding the LP residual. Time-domain decoding can be performed at a fixed rate (i.e., using the same number of bits, _No, for each frame) or a variable rate (where different bit rates are used for different types of frame content). Variable-rate decoders attempt to use the number of bits required to encode the parameters to a level sufficient to achieve the target quality.

例如CELP译码器的时域译码器可依赖于每帧大量位N₀以保持时域话音波形的准确性。倘若每帧位计数N_o相对较大(例如，8kbps或高于8kbps)，则这些译码器可递送极好的语音质量。在低位率(例如，4kbps及低于4kbps)下，归因于受限数目个可用位，时域译码器可不能保持高质量及稳固性能。在低位率下，受限码簿空间削减在较高速率商业应用中所部署的时域译码器的波形匹配能力。因此，低位率下的许多CELP译码系统操作遭受表征为噪声的感知显著失真。Time domain coders such as CELP coders can rely on a large number of bits per frame, N _, to maintain the accuracy of the time domain speech waveform. If the bit count per frame, N _, is relatively large (e.g., 8 kbps or higher), these coders can deliver excellent voice quality. At low bit rates (e.g., 4 kbps and lower), due to the limited number of available bits, time domain coders may not be able to maintain high quality and robust performance. At low bit rates, the limited codebook space reduces the waveform matching capabilities of time domain coders deployed in higher rate commercial applications. Therefore, many CELP coding system operations at low bit rates suffer from perceptually significant distortion characterized by noise.

低位率下对CELP译码器的替代为在类似于CELP译码器的原理下操作的“噪声激发线性预测”(NELP)译码器。NELP译码器使用经滤波伪随机噪声信号来模型化话音而非码簿。由于NELP使用用于经译码话音的较简单模型，因此NELP达成比CELP低的位率。NELP可用于压缩或表示清音话音或静默。An alternative to CELP coders at low bit rates is the "Noise Excited Linear Prediction" (NELP) coder, which operates on principles similar to CELP. NELP coders use a filtered pseudorandom noise signal to model speech rather than a codebook. Because NELP uses a simpler model for the coded speech, it achieves lower bit rates than CELP. NELP can be used to compress or represent unvoiced speech or silence.

以大约为2.4kbps的速率操作的译码系统在本质上通常是参数的。即，这些译码系统通过以常规时间间隔发射描述话音信号的音调周期及频谱包络(或共振峰)的参数进行操作。说明此类参数译码器的为LP声码器。Decoding systems operating at rates of approximately 2.4 kbps are typically parametric in nature. That is, they operate by transmitting parameters describing the pitch period and spectral envelope (or formants) of the speech signal at regular intervals. An example of such a parametric coder is the LP vocoder.

LP声码器通过每音调周期单一脉冲来模型化浊音话音信号。可扩增此基本技术以包含关于频谱包络以及其它物质的发射信息。尽管LP声码器提供通常合理的性能，但其可引入表征为蜂音的感知显著失真。The LP vocoder models voiced speech signals with a single pulse per pitch period. This basic technique can be augmented to include transmission information about the spectral envelope and other matters. Although the LP vocoder provides generally reasonable performance, it can introduce perceptually significant distortion characterized by buzzing.

近年来，已出现为波形译码器及参数译码器两者的混合的译码器。说明这些混合译码器的为原型波形内插(PWI)话音译码系统。PWI话音译码系统也可被称为原型音调周期(PPP)话音译码器。PWI话音译码系统提供用于译码浊音话音的有效方法。PWI的基本概念为以固定时间间隔提取代表性音调循环(原型波形)、发射其描述及通过在原型波形之间进行内插而重新构建话音信号。PWI方法可对LP残余信号抑或话音信号起作用。In recent years, decoders that are a hybrid of waveform and parametric coders have emerged. One example of these hybrid decoders is the prototype waveform interpolation (PWI) speech decoding system. PWI speech decoding systems may also be referred to as prototype pitch period (PPP) speech decoders. PWI speech decoding systems provide an efficient method for decoding voiced speech. The basic concept of PWI is to extract representative pitch cycles (prototype waveforms) at fixed time intervals, transmit their descriptions, and reconstruct the speech signal by interpolating between the prototype waveforms. The PWI method can operate on either the LP residual signal or the speech signal.

在传统电话系统(例如，公共交换电话网络(PSTN))中，信号带宽限于300赫兹(Hz)至3.4千赫兹(kHz)的频率范围。在宽带(WB)应用(例如，蜂窝式电话及因特网通讯协议语音(VoIP))中，信号带宽可跨越50Hz至7kHz的频率范围。超宽带(SWB)译码技术支持扩展至16kHz左右的带宽。将信号带宽自3.4kHz的窄频电话扩展至16kHz的SWB电话可改进信号重新构建的质量、可懂度及自然度。In traditional telephone systems, such as the Public Switched Telephone Network (PSTN), signal bandwidth is limited to a frequency range of 300 Hz to 3.4 kHz. In wideband (WB) applications, such as cellular phones and Voice over Internet Protocol (VoIP), signal bandwidth can span a frequency range of 50 Hz to 7 kHz. Ultra-wideband (SWB) decoding technology supports bandwidth extension to approximately 16 kHz. Extending the signal bandwidth from 3.4 kHz for narrowband telephony to 16 kHz for SWB telephony improves the quality, intelligibility, and naturalness of signal reconstruction.

宽带译码技术涉及编码及发射信号的较低频率部分(例如，50Hz至7kHz，也被称为“低频带”)。为了改进译码效率，可不完全编码及发射信号的较高频率部分(例如，7kHz至16kHz，也被称为“高频带”)。低频带信号的性质可用于生成高频带信号。举例来说，可基于低频带残余使用非线性模型(例如，绝对值函数)生成高频带激励信号。当低频带残余通过脉冲经稀疏译码时，由稀疏译码的残余生成的高频带激励信号可在高频带的清音区域中导致伪影。Wideband coding technology relates to the lower frequency portion (for example, 50Hz to 7kHz, also referred to as " low-frequency band ") of coding and transmitting signal. In order to improve decoding efficiency, the higher frequency portion (for example, 7kHz to 16kHz, also referred to as " high-frequency band ") of incomplete coding and transmitting signal can be used to generate high-frequency band signal. For example, can use nonlinear model (for example, absolute value function) to generate high-frequency band excitation signal based on low-frequency band residual. When low-frequency band residual is through sparse coding by pulse, the high-frequency band excitation signal generated by the residual of sparse coding can cause artifact in the unvoiced region of high-frequency band.

发明内容Summary of the Invention

揭示用于高频带激励信号生成的系统及方法。音频解码器可在发射装置处接收由音频编码器编码的音频信号。音频解码器可确定特定音频信号的浊音分类(例如，强浊音、弱浊音、弱清音、强清音)。举例来说，特定音频信号的范围可为强浊音(例如，话音信号)至强清音(例如，噪声信号)。音频解码器可基于浊音分类控制输入信号的表示的包络的量。Disclose the system and method for high frequency band excitation signal generation.Audio decoder can receive the audio signal by audio coder coding at transmitting device place.Audio decoder can determine the voiced sound classification (for example, strong voiced sound, weak voiced sound, weak voiceless sound, strong voiceless sound) of specific audio signal.For example, the scope of specific audio signal can be strong voiced sound (for example, voice signal) to strong voiceless sound (for example, noise signal).Audio decoder can be based on the amount of the envelope of the representation of voiced sound classification control input signal.

控制包络的量可包含控制包络的特性(例如，形状、频率范围、增益和/或量值)。举例来说，音频解码器可从经编码音频信号生成低频带激励信号，且可基于浊音分类控制低频带激励信号的包络的形状。举例来说，音讯译码器可基于应用于低频带激励信号的滤波器的截止频率控制包络的频率范围。作为另一实例，音频解码器可通过基于浊音分类调整线性预测译码(LPC)系数的一或多个极点来控制包络的量值、包络的形状、包络的增益或其组合。作为另一实例，音频解码器可通过基于浊音分类调整滤波器的系数来控制包络的量值、包络的形状、包络的增益或其组合，其中所述滤波器应用于低频带激励信号。The amount of control envelope can comprise the characteristic (for example, shape, frequency range, gain and/or magnitude) of control envelope.For example, audio decoder can generate low-band excitation signal from encoded audio signal, and can control the shape of the envelope of low-band excitation signal based on the classification of voiced sound.For example, audio coder can control the frequency range of envelope based on the cut-off frequency of the filter that is applied to low-band excitation signal.As another example, audio decoder can control the magnitude of envelope, the shape of envelope, the gain of envelope or its combination by one or more poles based on the classification of voiced sound adjustment linear prediction coding (LPC) coefficient.As another example, audio decoder can control the magnitude of envelope, the shape of envelope, the gain of envelope or its combination by the coefficient based on the classification of voiced sound adjustment filter, wherein said filter is applied to low-band excitation signal.

音讯译码器可基于包络的受控量调制白噪声信号。举例来说，相比在浊音分类为强清音时，经调制的白噪声信号在浊音分类为强浊音时可更多地对应于低频带激励信号。音频解码器可基于经调制的白噪声信号生成高频带激励信号。举例来说，音讯译码器可扩展低频带激励信号且可组合经调制的白噪声信号及经扩展的低频带信号来生成高频带激励信号。The audio codec can be based on the controlled amount modulation white noise signal of envelope.For example, compared when voiced sound is classified as strong voiceless, the modulated white noise signal can correspond to the low-band excitation signal more when voiced sound is classified as strong voiced.The audio decoder can generate the high-band excitation signal based on the modulated white noise signal.For example, the audio codec can expand the low-band excitation signal and can combine the modulated white noise signal and the low-band signal through extension to generate the high-band excitation signal.

在特定实施例中，一种方法包含在装置处确定输入信号的浊音分类。所述输入信号对应于音频信号。所述方法也包含基于浊音分类控制输入信号的表示的包络的量。所述方法进一步包含基于包络的受控量调制白噪声信号。所述方法包含基于经调制的白噪声信号生成高频带激励信号。In a particular embodiment, a method is included in a device to determine the voiced sound classification of an input signal. The input signal corresponds to an audio signal. The method also includes controlling the amount of an envelope of a representation of the input signal based on the voiced sound classification. The method further includes modulating a white noise signal based on the controlled amount of the envelope. The method includes generating a high-frequency band excitation signal based on the modulated white noise signal.

在另一特定实施例中，一种设备包含浊音分类器、包络调整器、调制器及输出电路。所述浊音分类器经配置以确定输入信号的浊音分类。所述输入信号对应于音频信号。所述包络调整器经配置以基于浊音分类控制输入信号的表示的包络的量。所述调制器经配置以基于包络的受控量调制白噪声信号。所述输出电路经配置以基于经调制的白噪声信号生成高频带激励信号。In another specific embodiment, a kind of apparatus comprises a voiced sound classifier, an envelope adjuster, a modulator and an output circuit.The voiced sound classifier is configured to determine the voiced sound classification of an input signal.The input signal corresponds to an audio signal.The envelope adjuster is configured to control the amount of the envelope of the representation of the input signal based on the voiced sound classification.The modulator is configured to modulate a white noise signal based on the controlled amount of the envelope.The output circuit is configured to generate a high frequency band excitation signal based on the modulated white noise signal.

在另一特定实施例中，一种计算机可读存储装置存储在由至少一个处理器执行时引起所述至少一个处理器确定输入信号的浊音分类的指令。所述指令在由至少一个处理器执行时进一步引起所述至少一个处理器基于浊音分类控制输入信号的表示的包络的量、基于包络的受控量调制白噪声信号及基于经调制的白噪声信号生成高频带激励信号。In another specific embodiment, a computer-readable storage device stores instructions that, when executed by at least one processor, cause the at least one processor to determine the voiced sound classification of an input signal. Said instructions, when executed by at least one processor, further cause the at least one processor to control the amount of an envelope of a representation of the input signal based on the voiced sound classification, modulate a white noise signal based on the controlled amount of the envelope, and generate a high-frequency band excitation signal based on the modulated white noise signal.

由所揭示实施例中的至少一者提供的特定优势包含生成对应于清音音频信号的平滑发声合成音频信号。举例来说，对应于清音音频信号的合成音频信号可具有极少(或不具有)伪影。本发明的其它方面、优点和特征将在审阅申请案之后变得显而易见，所述申请案包含以下部分：附图说明、实施方式及权利要求书。Particular advantages provided by at least one of the disclosed embodiments include generating a smooth-sounding synthesized audio signal corresponding to an unvoiced audio signal. For example, the synthesized audio signal corresponding to the unvoiced audio signal may have few (or no) artifacts. Other aspects, advantages, and features of the present invention will become apparent after reviewing the application, which includes the following sections: a brief description of the drawings, detailed description, and claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为说明包含装置的系统的特定实施例的图，所述装置可操作以执行高频带激励信号生成；FIG1 is a diagram illustrating a particular embodiment of a system including an apparatus operable to perform high-band excitation signal generation;

图2为说明可操作以执行高频带激励信号生成的解码器的特定实施例的图；Fig. 2 is the figure of the particular embodiment that explanation can be operated to carry out the decoder that high frequency band excitation signal generates;

图3为说明可操作以执行高频带激励信号生成的编码器的特定实施例的图；Fig. 3 is the figure of the particular embodiment that explanation can be operated to carry out the encoder that high frequency band excitation signal generates;

图4为说明高频带激励信号生成的方法的特定实施例的图；Fig. 4 is a diagram illustrating a specific embodiment of the method for generating a high-frequency band excitation signal;

图5为说明高频带激励信号生成的方法的另一实施例的图；Fig. 5 is a diagram illustrating another embodiment of the method for generating a high-frequency band excitation signal;

图6为说明高频带激励信号生成的方法的另一实施例的图；Fig. 6 is a diagram illustrating another embodiment of the method for generating a high-frequency band excitation signal;

图7为说明高频带激励信号生成的方法的另一实施例的图；Fig. 7 is a diagram illustrating another embodiment of the method for generating a high-frequency band excitation signal;

图8为说明高频带激励信号生成的方法的另一实施例的流程图；及FIG8 is a flow chart illustrating another embodiment of a method for generating a high-frequency band excitation signal; and

图9为根据图1至8的系统及方法的可操作以执行高频带激励信号生成的装置的框图。9 is a block diagram of an apparatus operable to perform high-band excitation signal generation according to the systems and methods of FIGS. 1-8 .

具体实施方式DETAILED DESCRIPTION

本文所描述的原理可应用于(例如)耳机、手持话机或经配置以执行高频带激励信号生成的其它音频装置。除非由其上下文明确限制，否则术语“信号”在本文中用以指示其通常意义中的任一者，包含如电线、总线或其它发射媒体上表达的存储器位置(或存储器位置的集合)的状态。除非由其上下文明确地限制，否则术语“生成”在本文中用以来指示其通常意义中的任一者，例如计算或另外产生。除非由其上下文明确限制，否则术语“计算”在本文中用以指示其通常意义中的任一者，例如计算、评估、平滑化和/或从多个值中进行选择。除非由其上下文明确限制，否则术语“获得”在本文中用以指示其通常意义中的任一者，例如计算、推导、接收(例如，从另一组件、块或装置)和/或检索(例如，从存储器寄存器或存储组件的阵列)。Principle described herein can be applicable to (for example) earphone, handset or other audio device that is configured to carry out high frequency band excitation signal generation.Unless by its context clearly limit, otherwise term " signal " is used in this article to indicate any one in its ordinary sense, comprises the state of the memory location (or the set of memory location) of expressing on electric wire, bus or other emission media.Unless by its context clearly limit, otherwise term " generate " is used in this article to indicate any one in its ordinary sense, for example calculate or produce in addition.Unless by its context clearly limit, otherwise term " calculate " is used in this article to indicate any one in its ordinary sense, for example calculate, evaluate, smoothen and/or select from a plurality of values.Unless by its context clearly limit, otherwise term " obtain " is used in this article to indicate any one in its ordinary sense, for example calculate, derive, receive (for example, from another component, piece or device) and/or retrieve (for example, from the array of memory register or storage component).

除非由其上下文明确地限制，否则术语“产生”是用以指示其通常意义中的任一者，例如计算、生成和/或提供。除非通过其上下文明确地限制，否则术语“提供”是用以指示其通常意义中的任一者，例如计算、生成和/或产生。除非由其上下文明确限制，否则术语“耦合”是用以指示直接或间接的电气或物理连接。如果连接为间接的，则所属领域的一般技术人员应充分地理解，在经“耦合”的结构之间可存在其它块或组件。Unless expressly limited by its context, the term "generating" is used to indicate any of its ordinary meanings, such as calculating, generating, and/or providing. Unless expressly limited by its context, the term "providing" is used to indicate any of its ordinary meanings, such as calculating, generating, and/or producing. Unless expressly limited by its context, the term "coupling" is used to indicate a direct or indirect electrical or physical connection. If the connection is indirect, it should be fully understood by those skilled in the art that other blocks or components may exist between the "coupled" structures.

术语“配置”可用于对如通过其特定上下文指示的方法、设备/装置和/或系统的参考中。在本描述及权利要求书中使用术语“包括”的情况下，其并不排除其它组件或操作。术语“基于”(如在“A基于B”中)用以指示其通常意义中的任一者，包含以下情况：(i)“基于至少”(例如，“A基于至少B”)；及如果在特定上下文中是适当的，则(ii)“等于”(例如，“A等于B”)。在A基于B包含基于至少的情况(i)下，此可包含A耦合至B的配置。类似地，术语“响应于”用以指示其通常意义中的任一者，包含“至少响应于”。术语“至少一个”用以指示其通常意义中的任一者，包含“一或多个”。术语“至少两个”用以指示其通常意义中的任一者，包含“两个或多于两个”。The term "configuration" may be used in reference to methods, apparatus/devices and/or systems as indicated by their specific context. Where the term "comprising" is used in this description and claims, it does not exclude other components or operations. The term "based on" (as in "A is based on B") is used to indicate any one of its usual meanings, including the following: (i) "based on at least" (e.g., "A is based on at least B"); and if appropriate in the specific context, (ii) "equal to" (e.g., "A is equal to B"). In the case where A is based on B including based on at least (i), this may include a configuration where A is coupled to B. Similarly, the term "responsive to" is used to indicate any one of its usual meanings, including "at least responsive to". The term "at least one" is used to indicate any one of its usual meanings, including "one or more". The term "at least two" is used to indicate any one of its usual meanings, including "two or more".

除非特定上下文另有指示，否则通用地及互换地使用术语“设备”及“装置”。除非另有指示，否则对具有特定特征的设备的操作的任何揭示内容也明确地希望揭示具有相似特征的方法(且反之亦然)，且对根据特定配置的设备的操作的任何揭示内容也明确地希望揭示根据相似配置的方法(且反之亦然)。除非特定上下文另有指示，否则通用地且可互换地利使用术语“方法”、“过程”、“程序”及“技术”。术语“组件”及“模块”可用于指示较大配置的一部分。以引用方式对文件的一部分的任何合并也应被理解为合并在所述部分内所引用的术语或变量的定义(其中这些定义出现在文件中的别处)以及在所合并部分中所引用的任何图式。Unless the specific context indicates otherwise, the terms "apparatus" and "device" are used generically and interchangeably. Unless otherwise indicated, any disclosure of the operation of an apparatus having specific features is also expressly intended to disclose methods having similar features (and vice versa), and any disclosure of the operation of an apparatus according to a specific configuration is also expressly intended to disclose methods according to a similar configuration (and vice versa). Unless the specific context indicates otherwise, the terms "method," "process," "procedure," and "technique" are used generically and interchangeably. The terms "component" and "module" may be used to indicate a portion of a larger configuration. Any incorporation of a portion of a document by reference should also be understood to incorporate the definitions of the terms or variables referenced within that portion (where such definitions appear elsewhere in the document) and any figures referenced in the incorporated portion.

如本文所使用，术语“通信装置”是指可用于经由无线通信网络的语音和/或数据通信的电子装置。通信装置的实例包含蜂窝式电话、个人数字助理(PDA)、手持型装置、耳机、无线调制解调器、膝上型计算机、个人计算机等。As used herein, the term "communication device" refers to an electronic device that can be used for voice and/or data communication via a wireless communication network. Examples of communication devices include cellular phones, personal digital assistants (PDAs), handheld devices, headsets, wireless modems, laptop computers, personal computers, etc.

参考图1，展示包含可操作以执行高频带激励信号生成的装置的系统的特定实施例，且通常将其指定为100。在特定实施例中，系统100的一或多个组件可集成至解码系统或设备中(例如，无线电话或译码器/解码器(编解码器)中)、集成至编码系统或设备中或所述两者中。在其它实施例中，系统100的一或多个组件可集成至机顶盒、音乐播放器、视频播放器、娱乐单元、导航装置、通信装置、个人数字助理(PDA)、固定位置数据单元或计算机中。With reference to figure 1, show the specific embodiment that comprises the system of the device that can operate to carry out high frequency band excitation signal generation, and it is designated as 100 usually.In a specific embodiment, one or more components of system 100 can be integrated into (for example, in wireless telephone or decoder/decoder (codec)) in decoding system or the equipment, be integrated into coding system or the equipment or in described two.In other embodiments, one or more components of system 100 can be integrated into in set-top box, music player, video player, entertainment unit, navigation device, communicator, personal digital assistant (PDA), fixed position data unit or the computer.

应注意，在以下描述中，将由图1的系统100执行的各种功能描述为由某些组件或模块执行。组件及模块的此划分仅是为了说明。在替代实施例中，由特定组件或模块执行的功能可划分为多个组件或模块。此外，在替代实施例中，图1的两个或多于两个组件或模块可集成至单一组件或模块中。可使用硬件(例如，现场可编程门阵列(FPGA)装置、专用集成电路(ASIC)、数字信号处理器(DSP)、控制器等)、软件(例如，可由处理器执行的指令)或其任何组合实施图1中所说明的每一组件或模块。It should be noted that in the following description, the various functions performed by the system 100 of Figure 1 are described as being performed by certain components or modules. This division of components and modules is for illustration only. In alternative embodiments, the functions performed by a particular component or module may be divided into multiple components or modules. In addition, in alternative embodiments, two or more components or modules of Figure 1 may be integrated into a single component or module. Each component or module illustrated in Figure 1 may be implemented using hardware (e.g., a field programmable gate array (FPGA) device, an application specific integrated circuit (ASIC), a digital signal processor (DSP), a controller, etc.), software (e.g., instructions executable by a processor), or any combination thereof.

尽管图1至9中所描绘的说明性实施例是关于高频带模型描述的，所述高频带模型类似于用于增强型变量率编解码器-窄频-宽带(EVRC-NW)中的模型，但说明性实施例中的一或多者可使用任何其它高频带模型。应理解，仅例如描述任何特定模型的使用。Although the illustrative embodiment described in Fig. 1 to 9 is described about the high-frequency band model, described high-frequency band model is similar to the model that is used for enhanced variable rate codec-narrowband-wideband (EVRC-NW), one or more in the illustrative embodiment can use any other high-frequency band model.Should be understood that, only for example, the use of any particular model is described.

系统100包含经由网络120与第一装置102通信的移动装置104。移动装置104可耦合至麦克风146或与其通信。移动装置104可包含激励信号生成模块122、高频带编码器172、多路复用器(MUX)174、发射器176或其组合。第一装置102可耦合至扬声器142或与其通信。第一装置102可包含经由高频带合成器168耦合至MUX 170的激励信号生成模块122。激励信号生成模块122可包含浊音分类器160、包络调整器162、调制器164、输出电路166或其组合。System 100 comprises the mobile device 104 that communicates with first device 102 via network 120.Mobile device 104 can be coupled to microphone 146 or communicate with it.Mobile device 104 can comprise excitation signal generation module 122, high frequency band encoder 172, multiplexer (MUX) 174, transmitter 176 or its combination.First device 102 can be coupled to loudspeaker 142 or communicate with it.First device 102 can comprise excitation signal generation module 122 that is coupled to MUX 170 via high frequency band synthesizer 168.Excitation signal generation module 122 can comprise voiced sound classifier 160, envelope adjuster 162, modulator 164, output circuit 166 or its combination.

在操作期间，移动装置104可接收输入信号130(例如，第一用户152的用户话音信号，清音信号，或所述两者)。举例来说，第一用户152可与第二用户154进行语音呼叫。第一用户152可使用移动装置104，且第二用户154可使用第一装置102用于语音呼叫。在语音呼叫期间，第一用户152可向耦合至移动装置104的麦克风146说话。输入信号130可对应于第一用户152的话音、背景噪声(例如，音乐、街道噪声、另一个人的话音等)或其组合。移动装置104可经由麦克风146接收输入信号130。During operation, the mobile device 104 may receive an input signal 130 (e.g., a user voice signal of a first user 152, an unvoiced signal, or both). For example, the first user 152 may be in a voice call with the second user 154. The first user 152 may be using the mobile device 104, and the second user 154 may be using the first device 102 for the voice call. During the voice call, the first user 152 may speak into a microphone 146 coupled to the mobile device 104. The input signal 130 may correspond to the voice of the first user 152, background noise (e.g., music, street noise, another person's voice, etc.), or a combination thereof. The mobile device 104 may receive the input signal 130 via the microphone 146.

在特定实施例中，输入信号130可为包含在自近似50赫兹(Hz)至近似16千赫兹(kHz)的频率范围中的数据的超宽带(SWB)信号。输入信号130的低频带部分及输入信号130的高频带部分可分别占据50Hz至7kHz及7kHz至16kHz的非重叠频带。在替代实施例中，低频带部分及高频带部分可分别占据50Hz至8kHz及8kHz至16kHz的非重叠频带。在另一替代实施例中，低频带部分及高频带部分可重叠(例如，分别为50Hz至8kHz及7kHz至16kHz)。In a particular embodiment, the input signal 130 may be an ultra-wideband (SWB) signal containing data in a frequency range from approximately 50 hertz (Hz) to approximately 16 kilohertz (kHz). The low-band portion of the input signal 130 and the high-band portion of the input signal 130 may occupy non-overlapping frequency bands of 50 Hz to 7 kHz and 7 kHz to 16 kHz, respectively. In an alternative embodiment, the low-band portion and the high-band portion may occupy non-overlapping frequency bands of 50 Hz to 8 kHz and 8 kHz to 16 kHz, respectively. In another alternative embodiment, the low-band portion and the high-band portion may overlap (e.g., 50 Hz to 8 kHz and 7 kHz to 16 kHz, respectively).

在特定实施例中，输入信号130可为具有近似50Hz至近似8kHz的频率范围的宽带(WB)信号。在此实施例中，输入信号130的低频带部分可对应于近似50Hz至近似6.4kHz的频率范围，且输入信号130的高频带部分可对应于近似6.4kHz至近似8kHz的频率范围。In a particular embodiment, the input signal 130 may be a wideband (WB) signal having a frequency range of approximately 50 Hz to approximately 8 kHz. In this embodiment, the low-band portion of the input signal 130 may correspond to a frequency range of approximately 50 Hz to approximately 6.4 kHz, and the high-band portion of the input signal 130 may correspond to a frequency range of approximately 6.4 kHz to approximately 8 kHz.

在特定实施例中，麦克风146可捕获输入信号130，且在移动装置104处的模/数转换器(ADC)可将经捕获输入信号130自模拟波形转换成由数字音频样本组成的数字波形。数字音频样本可由数字信号处理器处理。增益调整器可通过提高或降低音频信号(例如，模拟波形或数字波形)的振幅等级来调整增益(例如，模拟波形或数字波形的增益)。增益调整器可在模拟或数字域中操作。举例来说，增益调整器可在数字域中操作且可调整由模/数转换器产生的数字音频样本。在增益调整之后，回音消除器可减小可已由扬声器的输出输入麦克风146所产生的任何回音。数字音频样本可由声码器(语音编码器-解码器)“压缩”。回音消除器的输出可耦合至声码器预处理块，例如，滤波器、噪声处理器、速率转换器等。声码器的编码器可压缩数字音频样本且形成发射包(数字音频样本的经压缩位的表示)。在特定实施例中，声码器的编码器可包含激励信号生成模块122。激励信号生成模块122可生成高频带激励信号186，如参考第一装置102所描述。激励信号生成模块122可将高频带激励信号186提供至高频带编码器172。In a particular embodiment, microphone 146 may capture input signal 130, and an analog-to-digital converter (ADC) at mobile device 104 may convert the captured input signal 130 from an analog waveform into a digital waveform consisting of digital audio samples. The digital audio samples may be processed by a digital signal processor. A gain adjuster may adjust the gain (e.g., the gain of the analog waveform or digital waveform) by increasing or decreasing the amplitude level of the audio signal (e.g., the analog waveform or digital waveform). The gain adjuster may operate in either the analog or digital domain. For example, the gain adjuster may operate in the digital domain and may adjust the digital audio samples generated by the analog-to-digital converter. After gain adjustment, an echo canceller may reduce any echo that may have been generated by the output of the speaker input to microphone 146. The digital audio samples may be "compressed" by a vocoder (speech encoder-decoder). The output of the echo canceller may be coupled to a vocoder pre-processing block, such as a filter, a noise processor, a rate converter, etc. The vocoder's encoder may compress the digital audio samples and form a transmit packet (a compressed bit representation of the digital audio samples). In a particular embodiment, the encoder of the vocoder can include an excitation signal generation module 122. The excitation signal generation module 122 can generate a high-band excitation signal 186, as described with reference to the first device 102. The excitation signal generation module 122 can provide the high-band excitation signal 186 to the high-band encoder 172.

高频带编码器172可基于高频带激励信号186编码输入信号130的高频带信号。举例来说，高频带编码器172可基于高频带激励信号186生成高频带位流190。高频带位流190可包含高频带参数信息。举例来说，高频带位流190可包含以下中的至少一者：高频带线性预测译码(LPC)系数、高频带线谱频率(LSF)、高频带线谱对(LSP)、增益形状(例如，对应于特定帧的子帧的时间增益参数)、增益帧(例如，对应于用于特定帧的高频带与低频带的能量比率的增益参数)或对应于输入信号130的高频带部分的其它参数。在特定实施例中，高频带编码器172可使用向量量化器、隐马尔可夫模型(HMM)或高斯混合模型(GMM)中的至少一者确定高频带LPC系数。高频带编码器172可基于LPC系数确定高频带LSF、高频带LSP或所述两者。High frequency band encoder 172 can be based on the high frequency band signal of high frequency band excitation signal 186 encoding input signal 130.For example, high frequency band encoder 172 can generate high frequency band bit stream 190 based on high frequency band excitation signal 186.High frequency band bit stream 190 can comprise high frequency band parameter information.For example, high frequency band bit stream 190 can comprise at least one of following: high frequency band linear predictive coding (LPC) coefficient, high frequency band line spectrum frequency (LSF), high frequency band line spectrum pair (LSP), gain shape (for example, corresponding to the time gain parameter of the subframe of particular frame), gain frame (for example, corresponding to the gain parameter of the energy ratio of the high frequency band and low frequency band for particular frame) or corresponding to other parameters of the high frequency band part of input signal 130.In a particular embodiment, high frequency band encoder 172 can use at least one in vector quantizer, hidden Markov model (HMM) or Gaussian mixture model (GMM) to determine the high frequency band LPC coefficient.High frequency band encoder 172 can determine high frequency band LSF, high frequency band LSP or the two based on LPC coefficient.

高频带编码器172可基于输入信号130的高频带信号生成高频带参数信息。举例来说，移动装置104的解码器可模拟第一装置102的解码器。移动装置104的解码器可基于高频带激励信号186生成合成音频信号，如参考第一装置102所描述。高频带编码器172可基于合成音频信号与输入信号130的比较生成增益值(例如，增益形状、增益帧或所述两者)。举例来说，增益值可对应于合成音频信号与输入信号130之间的差异。高频带编码器172可将高频带位流190提供至MUX 174。High frequency band encoder 172 can generate high frequency band parameter information based on the high frequency band signal of input signal 130.For example, the decoder of mobile device 104 can simulate the decoder of first device 102.The decoder of mobile device 104 can generate synthesized audio signal based on high frequency band excitation signal 186, as described with reference to first device 102.High frequency band encoder 172 can generate gain value (for example, gain shape, gain frame or described both) based on the comparison of synthesized audio signal and input signal 130.For example, gain value can correspond to the difference between synthesized audio signal and the input signal 130.High frequency band encoder 172 can provide high frequency band bit stream 190 to MUX 174.

MUX 174可将高频带位流190与低频带位流进行组合以生成位流132。移动装置104的低频带编码器可基于输入信号130的低频带信号生成低频带位流。低频带位流可包含低频带参数信息(例如，低频带LPC系数、低频带LSF或所述两者)及低频带激励信号(例如，输入信号130的低频带残余)。发射包可对应于位流132。MUX 174 can combine high-band bit stream 190 with low-band bit stream to generate bit stream 132. The low-band encoder of mobile device 104 can generate low-band bit stream based on the low-band signal of input signal 130. The low-band bit stream can include low-band parameter information (e.g., low-band LPC coefficients, low-band LSF, or both) and low-band excitation signal (e.g., low-band residual of input signal 130). Transmit packet can correspond to bit stream 132.

发射包可存储在可与移动装置104的处理器共享的存储器中。处理器可为与数字信号处理器通信的控制处理器。移动装置104可经由网络120将位流132发射至第一装置102。举例来说，发射器176可调制某一形式的发射包(可将其它信息附于所述发射包)且经由天线空中发送经调制信息。The transmit packets can be stored in a memory that can be shared with a processor of the mobile device 104. The processor can be a control processor in communication with a digital signal processor. The mobile device 104 can transmit the bitstream 132 to the first device 102 via the network 120. For example, the transmitter 176 can modulate some form of transmit packet (to which other information can be attached) and send the modulated information over the air via an antenna.

第一装置102的激励信号生成模块122可接收位流132。举例来说，第一装置102的天线可接收包括发射包的某一形式的传入包。位流132可对应于经脉码调制(PCM)编码的音频信号的帧。举例来说，在第一装置102处的模/数转换器(ADC)可将位流132从模拟信号转换成具有多个帧的数字PCM信号。The excitation signal generation module 122 of the first device 102 can receive a bitstream 132. For example, an antenna of the first device 102 can receive some form of incoming packet including a transmit packet. The bitstream 132 can correspond to frames of an audio signal encoded using pulse code modulation (PCM). For example, an analog-to-digital converter (ADC) at the first device 102 can convert the bitstream 132 from an analog signal into a digital PCM signal having multiple frames.

发射包可“未由在第一装置102处的声码器的解码器压缩”。未压缩波形(或数字PCM信号)可被称作重新构建的音频样本。重新构建的音频样本可由声码器后处理块进行后处理且可由回音消除器使用以去除回音。为清楚起见，声码器的解码器及声码器后处理块可被称作声码器解码器模块。在一些配置中，回音消除器的输出可由激励信号生成模块122处理。替代地，在其它配置中，声码器解码器模块的输出可由激励信号生成模块122处理。The transmit packet may be "uncompressed by the decoder of the vocoder at the first device 102". The uncompressed waveform (or digital PCM signal) may be referred to as a reconstructed audio sample. The reconstructed audio sample may be post-processed by the vocoder post-processing block and may be used by the echo canceller to remove the echo. For clarity, the decoder of the vocoder and the vocoder post-processing block may be referred to as a vocoder decoder module. In some configurations, the output of the echo canceller may be processed by the excitation signal generation module 122. Alternatively, in other configurations, the output of the vocoder decoder module may be processed by the excitation signal generation module 122.

激励信号生成模块122可从位流132提取低频带参数信息、低频带激励信号及高频带参数信息。浊音分类器160可确定指示输入信号130的浊音/清音性质(例如，强浊音、弱浊音、弱清音或强清音)的浊音分类180(例如，0.0至1.0的值)，如参考图2所描述。浊音分类器160可将浊音分类180提供至包络调整器162。The excitation signal generation module 122 can extract low-band parameter information, low-band excitation signal and high-band parameter information from the bitstream 132. The voiced sound classifier 160 can determine a voiced sound classification 180 (e.g., a value of 0.0 to 1.0) indicating the voiced sound/unvoiced sound property (e.g., strongly voiced sound, weakly voiced sound, weakly unvoiced sound or strongly unvoiced sound) of the input signal 130, as described with reference to FIG2 . The voiced sound classifier 160 can provide the voiced sound classification 180 to the envelope adjuster 162.

包络调整器162可确定输入信号130的表示的包络。包络可为时变包络。举例来说，包络可每输入信号130的帧更新超过一次。作为另一实例，可响应于接收输入信号130的每一样本的包络调整器162而更新包络。相比在浊音分类对应于强清音时，包络的形状的变化程度在浊音分类180对应于强浊音时可更大。输入信号130的表示可包含输入信号130的(或输入信号130的经编码版本的)低频带激励信号、输入信号130的(或输入信号130的经编码版本的)高频带激励信号或谐波性扩展的激励信号。举例来说，激励信号生成模块122可通过扩展输入信号130的(或输入信号130的经编码版本的)低频带激励信号来生成谐波性扩展的激励信号。Envelope adjuster 162 can determine the envelope of the representation of input signal 130.Envelope can be a time-varying envelope.For example, envelope can be updated more than once at the frame of every input signal 130.As another example, envelope can be updated in response to the envelope adjuster 162 of each sample that receives input signal 130.Compared when the voiced sound classification corresponds to strong voiceless, the variation degree of the shape of envelope can be larger when the voiced sound classification 180 corresponds to strong voiced.The representation of input signal 130 can comprise input signal 130 (or input signal 130 through coded version) low-band excitation signal, input signal 130 (or input signal 130 through coded version) high-band excitation signal or the harmonic expansion excitation signal.For example, excitation signal generation module 122 can generate the harmonic expansion excitation signal by extending input signal 130 (or input signal 130 through coded version) low-band excitation signal.

包络调整器162可基于浊音分类180控制包络的量，如参考图4至7所描述。包络调整器162可通过控制包络的特性(例如，形状、量值、增益和/或频率范围)来控制包络的量。举例来说，包络调整器162可基于滤波器的截止频率控制包络的频率范围，如参考图4所描述。可基于浊音分类180确定截止频率。The envelope adjuster 162 may control the amount of the envelope based on the voiced speech classification 180, as described with reference to Figures 4 to 7. The envelope adjuster 162 may control the amount of the envelope by controlling characteristics of the envelope (e.g., shape, magnitude, gain, and/or frequency range). For example, the envelope adjuster 162 may control the frequency range of the envelope based on the cutoff frequency of the filter, as described with reference to Figure 4. The cutoff frequency may be determined based on the voiced speech classification 180.

作为另一实例，包络调整器162可通过基于浊音分类180调整高频带线性预测译码(LPC)系数的一或多个极点来控制包络的形状、包络的量值、包络的增益或其组合，如参考图5所描述。作为另一实例，包络调整器162可通过基于浊音分类180调整滤波器的系数来控制包络的形状、包络的量值、包络的增益或其组合，如参考图6所描述。可在变换域(例如，频域)或时域中控制包络的特性，如参考图4至6所描述。As another example, the envelope adjuster 162 may control the shape of the envelope, the magnitude of the envelope, the gain of the envelope, or a combination thereof by adjusting one or more poles of high-band linear predictive coding (LPC) coefficients based on the voiced speech classification 180, as described with reference to FIG5. As another example, the envelope adjuster 162 may control the shape of the envelope, the magnitude of the envelope, the gain of the envelope, or a combination thereof by adjusting coefficients of a filter based on the voiced speech classification 180, as described with reference to FIG6. The characteristics of the envelope may be controlled in a transform domain (e.g., a frequency domain) or a time domain, as described with reference to FIG4-6.

包络调整器162可将信号包络182提供至调制器164。信号包络182可对应于输入信号130的表示的包络的受控量。Envelope adjuster 162 may provide a signal envelope 182 to modulator 164. Signal envelope 182 may correspond to a controlled amount of the envelope of the representation of input signal 130.

调制器164可使用信号包络182来调制白噪声156以生成经调制的白噪声184。调制器164可将经调制的白噪声184提供至输出电路166。Modulator 164 may modulate white noise 156 using signal envelope 182 to generate modulated white noise 184 . Modulator 164 may provide modulated white noise 184 to output circuit 166 .

输出电路166可基于经调制的白噪声184生成高频带激励信号186。举例来说，输出电路166可组合经调制的白噪声184与另一信号来生成高频带激励信号186。在特定实施例中，另一信号可对应于基于低频带激励信号生成的扩展信号。举例来说，输出电路166可通过升采样低频带激励信号、对经升采样信号应用绝对值函数、降采样应用绝对值函数的结果及使用适应性白化来用线性预测滤波器(例如，四阶线性预测滤波器)以频谱方式平坦化经降采样信号来生成扩展信号。在特定实施例中，输出电路166可基于谐波性参数缩放经调制的白噪声184及另一信号，如参考图4至7所描述。Output circuit 166 can generate high-band excitation signal 186 based on modulated white noise 184.For example, output circuit 166 can combine modulated white noise 184 and another signal to generate high-band excitation signal 186.In a specific embodiment, another signal can correspond to the extended signal generated based on the low-band excitation signal.For example, output circuit 166 can be by up-sampling low-band excitation signal, to applying absolute value function, down-sampling application absolute value function result and use adaptive whitening to come with linear prediction filter (for example, fourth-order linear prediction filter) with spectrum mode flattening through down-sampled signal to generate extended signal.In a specific embodiment, output circuit 166 can be based on harmonic parameter scaling modulated white noise 184 and another signal, as described with reference to Figures 4 to 7.

在特定实施例中，输出电路166可组合经调制的白噪声的第一比率与未经调制的白噪声的第二比率来生成经缩放的白噪声，其中第一比率及第二比率是基于浊音分类180确定的，如参考图7所描述。在此实施例中，输出电路166可组合经缩放的白噪声与另一信号来生成高频带激励信号186。输出电路166可将高频带激励信号186提供至高频带合成器168。In a particular embodiment, output circuit 166 can combine the first ratio of modulated white noise and the second ratio of unmodulated white noise to generate the white noise through scaling, wherein the first ratio and the second ratio are determined based on voiced sound classification 180, as described with reference to Figure 7. In this embodiment, output circuit 166 can combine the white noise through scaling and another signal to generate high-frequency band excitation signal 186.Output circuit 166 can provide high-frequency band excitation signal 186 to high-frequency band synthesizer 168.

高频带合成器168可基于高频带激励信号186生成合成高频带信号188。举例来说，高频带合成器168可基于特定高频带模型模型化和/或解码高频带参数信息，且可使用高频带激励信号186来生成合成的高频带信号188。高频带合成器168可将合成高频带信号188提供至MUX 170。High-frequency band synthesizer 168 can generate synthetic high-frequency band signal 188 based on high-frequency band excitation signal 186.For example, high-frequency band synthesizer 168 can be based on specific high-frequency band model modeling and/or decoding high-frequency band parameter information, and can use high-frequency band excitation signal 186 to generate synthetic high-frequency band signal 188.High-frequency band synthesizer 168 can provide synthetic high-frequency band signal 188 to MUX 170.

第一装置102的低频带解码器可生成合成的低频带信号。举例来说，低频带解码器可基于特定低频带模型解码和/或模型化低频带参数信息，且可使用低频带激励信号来生成合成的低频带信号。MUX 170可组合合成高频带信号188与合成低频带信号来生成输出信号116(例如，经解音频信号)。The low-band decoder of first device 102 can generate synthetic low-band signal.For example, low-band decoder can be based on specific low-band model decoding and/or modeling low-band parameter information, and can use low-band excitation signal to generate synthetic low-band signal.MUX 170 can combine synthetic high-band signal 188 and synthetic low-band signal to generate output signal 116 (for example, through de-audio signal).

输出信号116可由增益调整器扩增或抑制。第一装置102可经由扬声器142将输出信号116提供至第二用户154。举例来说，增益调整器的输出可通过数/模转换器自数字信号转换成模拟信号且经由扬声器142放出。The output signal 116 can be amplified or suppressed by the gain adjuster. The first device 102 can provide the output signal 116 to the second user 154 via the speaker 142. For example, the output of the gain adjuster can be converted from a digital signal to an analog signal by a digital-to-analog converter and output via the speaker 142.

由此，系统100可在合成音频信号对应于清音(或强清音)输入信号时使得能够生成“平滑”发声合成信号。可使用基于输入信号的浊音分类调制的噪声信号生成合成高频带信号。相比在输入信号为强清音时，经调制的噪声信号在输入信号为强浊音时可更密切地对应于输入信号。在特定实施例中，当输入信号为强清音时，合成高频带信号可具有降低的稀疏性或不具有稀疏性，从而产生更平滑(例如，具有较少伪影)的合成音频信号。Thus, system 100 can enable generation of a "smooth" voicing synthesized signal when the synthesized audio signal corresponds to an unvoiced (or strongly unvoiced) input signal. A synthesized high-band signal can be generated using a noise signal modulated based on the voiced sound classification of the input signal. Compared to when the input signal is strongly unvoiced, the modulated noise signal can more closely correspond to the input signal when the input signal is strongly voiced. In a particular embodiment, when the input signal is strongly unvoiced, the synthesized high-band signal can have reduced sparsity or no sparsity, thereby producing a smoother (e.g., with fewer artifacts) synthesized audio signal.

参考图2，揭示可操作以执行高频带激励信号生成的解码器的特定实施例，且通常将其指定为200。在特定实施例中，解码器200可对应于或包含于图1的系统100中。举例来说，解码器200可包含于第一装置102、移动装置104或所述两者中。解码器200可说明在接收装置(例如，第一装置102)处的经编码音频信号的解码。With reference to figure 2, disclose the particular embodiment of the decoder that can operate to carry out high frequency band excitation signal generation, and it is designated as 200 usually.In a particular embodiment, decoder 200 can correspond to or be included in the system 100 of Fig. 1.For example, decoder 200 can be included in first device 102, mobile device 104 or described in the two.Decoder 200 can illustrate the decoding of the encoded audio signal at receiving device (for example, first device 102) place.

解码器200包含耦合至低频带合成器204的多路分用器(DEMUX)202、浊音因数产生器208及高频带合成器168。低频带合成器204及浊音因数产生器208可经由激励信号产生器222耦合至高频带合成器168。在特定实施例中，浊音因数产生器208可对应于图1的浊音分类器160。激励信号产生器222可为图1的激励信号生成模块122的特定实施例。举例来说，激励信号产生器222可包含包络调整器162、调制器164、输出电路166、浊音分类器160或其组合。低频带合成器204及高频带合成器168可耦合至MUX 170。Decoder 200 comprises demultiplexer (DEMUX) 202, voiced sound factor producer 208 and high frequency band synthesizer 168 that are coupled to low frequency band synthesizer 204.Low frequency band synthesizer 204 and voiced sound factor producer 208 can be coupled to high frequency band synthesizer 168 via excitation signal generator 222.In a particular embodiment, voiced sound factor producer 208 can correspond to the voiced sound classifier 160 of Fig. 1. Excitation signal generator 222 can be the particular embodiment of the excitation signal generation module 122 of Fig. 1.For example, excitation signal generator 222 can comprise envelope adjuster 162, modulator 164, output circuit 166, voiced sound classifier 160 or its combination.Low frequency band synthesizer 204 and high frequency band synthesizer 168 can be coupled to MUX 170.

在操作期间，DEMUX 202可接收位流132。位流132可对应于经脉码调制(PCM)编码的音频信号的帧。举例来说，在第一装置102处的模/数转换器(ADC)可将位流132自模拟信号转换成具有多个帧的数字PCM信号。DEMUX 202可自位流132生成位流的低频带部分232及位流的高频带部分218。DEMUX 202可将位流的低频带部分232提供至低频带合成器204且可将位流的高频带部分218提供至高频带合成器168。During operation, the DEMUX 202 can receive a bitstream 132. The bitstream 132 can correspond to frames of an audio signal encoded using pulse code modulation (PCM). For example, an analog-to-digital converter (ADC) at the first device 102 can convert the bitstream 132 from an analog signal into a digital PCM signal having a plurality of frames. The DEMUX 202 can generate a low-band portion 232 of the bitstream and a high-band portion 218 of the bitstream from the bitstream 132. The DEMUX 202 can provide the low-band portion 232 of the bitstream to the low-band synthesizer 204 and can provide the high-band portion 218 of the bitstream to the high-band synthesizer 168.

低频带合成器204可从位流的低频带部分232提取和/或解码一或多个参数242(例如，输入信号130的低频带参数信息)及低频带激励信号244(例如，输入信号130的低频带残余)。在特定实施例中，低频带合成器204可从位流的低频带部分232提取谐波性参数246。The low-band synthesizer 204 can extract and/or decode one or more parameters 242 (e.g., low-band parameter information of the input signal 130) and a low-band excitation signal 244 (e.g., low-band residual of the input signal 130) from the low-band portion 232 of the bitstream. In a particular embodiment, the low-band synthesizer 204 can extract harmonic parameters 246 from the low-band portion 232 of the bitstream.

谐波性参数246可在位流232的编码期间嵌入位流的低频带部分232中且可对应于输入信号130的高频带中谐波与噪声能量的比率。低频带合成器204可基于音调增益值确定谐波性参数246。低频带合成器204可基于参数242确定音调增益值。在特定实施例中，低频带合成器204可从位流的低频带部分232提取谐波性参数246。举例来说，移动装置104可包含在位流132中的谐波性参数246，如参考图3所描述。The harmonics parameter 246 may be embedded in the low-band portion 232 of the bitstream during encoding of the bitstream 232 and may correspond to a ratio of harmonics to noise energy in the high-band of the input signal 130. The low-band synthesizer 204 may determine the harmonics parameter 246 based on the pitch gain value. The low-band synthesizer 204 may determine the pitch gain value based on the parameter 242. In a particular embodiment, the low-band synthesizer 204 may extract the harmonics parameter 246 from the low-band portion 232 of the bitstream. For example, the mobile device 104 may include the harmonics parameter 246 in the bitstream 132, as described with reference to FIG3.

低频带合成器204可基于参数242及低频带激励信号244使用特定低频带模型生成合成低频带信号234。低频带合成器204可将合成低频带信号234提供至MUX 170。The low-band synthesizer 204 may generate a synthesized low-band signal 234 using a specific low-band model based on the parameters 242 and the low-band excitation signal 244. The low-band synthesizer 204 may provide the synthesized low-band signal 234 to the MUX 170.

浊音因数产生器208可从低频带合成器204接收参数242。浊音因数产生器208可基于参数242、先前浊音决策、一或多个其它因数或其组合生成浊音因数236(例如，0.0至1.0的值)。浊音因数236可指示输入信号130的浊音/清音性质(例如，强浊音、弱浊音、弱清音或强清音)。参数242可包含输入信号130的低频带信号的零交叉率、第一反射系数、低频带激励中的适应性码簿贡献的能量与低频带激励中适应性码簿及固定码簿贡献的和的能量的比率、输入信号130的低频带信号的音调增益或其组合。浊音因数产生器208可基于方程式1确定浊音因数236。The voicing factor generator 208 can receive parameter 242 from low-band synthesizer 204. The voicing factor generator 208 can generate voicing factor 236 (for example, 0.0 to 1.0 value) based on parameter 242, previous voicing decision, one or more other factors or its combination. The voicing factor 236 can indicate the voiced sound/unvoiced sound property (for example, strong voiced sound, weak voiced sound, weak unvoiced sound or strong unvoiced sound) of input signal 130. Parameter 242 can comprise the zero crossing rate of the low-band signal of input signal 130, the first reflection coefficient, the energy that the adaptive codebook in the low-band excitation contributes and the low-band excitation and the ratio of the energy that the adaptive codebook and fixed codebook contribute, the pitch gain of the low-band signal of input signal 130 or its combination. The voicing factor generator 208 can determine voicing factor 236 based on equation 1.

浊音因数(Voicing Factor)＝∑a_i*p_i+c， (方程式1)Voicing Factor = ∑a _i * _pi + c, (Equation 1)

其中i∈{0,…,M-1}，其中a_i及c为权重，p_i对应于特定经测量信号参数，且M对应于用于浊音因数确定的参数的数目。where i∈{0,…,M−1}, where a _i and c are weights, _pi corresponds to a specific measured signal parameter, and M corresponds to the number of parameters used for voiced voice factor determination.

在说明性实施例中，浊音因数(Voicing Factor)＝-0.4231*ZCR+0.2712*FR+0.0458*ACB_to_excitation+0.1849*PG+0.0138*prev_voicing_decision+0.0611，其中ZCR对应于零交叉速率，FR对应于第一反射系数，ACB_to_excitation对应于低频带激励中适应性码簿贡献的能量与低频带激励中适应性码簿及固定码簿贡献的总和的能量的比率，PG对应于音调增益，且previous_voicing_decision对应于先前针对另一帧计算的另一浊音因数。在特定实施例中，浊音因数产生器208可使用较高阈值以用于将帧分类为清音而非浊音。举例来说，如果将前述帧分类为清音且所述帧具有满足第一阈值(例如，低阈值)的浊音值，则浊音因数产生器208可将帧分类为清音。浊音因数产生器208可基于以下各者确定浊音值：输入信号130的低频带信号的零交叉速率、第一反射系数、低频带激励中适应性码簿贡献的能量与低频带激励中适应性码簿及固定码簿贡献的总和的能量的比率、输入信号130的低频带信号的音调增益或其组合。替代地，如果帧的浊音值满足第二阈值(例如，极低阈值)，则浊音因数产生器208可将帧分类为清音。在特定实施例中，浊音因数236可对应于图1的浊音分类180。In an illustrative embodiment, the voicing factor (Voicing Factor) = -0.4231*ZCR+0.2712*FR+0.0458*ACB_to_excitation+0.1849*PG+0.0138*prev_voicing_decision+0.0611, wherein ZCR corresponds to the zero crossing rate, FR corresponds to the first reflection coefficient, ACB_to_excitation corresponds to the ratio of the energy contributed by the adaptive codebook in the low-band excitation to the energy of the sum of the contributions of the adaptive codebook and the fixed codebook in the low-band excitation, PG corresponds to the pitch gain, and previous_voicing_decision corresponds to another voicing factor previously calculated for another frame. In a particular embodiment, the voicing factor generator 208 can use a higher threshold value for classifying a frame as unvoiced rather than voiced. For example, if the aforementioned frame is classified as unvoiced and the frame has a voiced value that satisfies a first threshold value (e.g., a low threshold value), the voicing factor generator 208 can classify the frame as unvoiced. The voicing factor generator 208 can determine the voicing value based on the following: the zero crossing rate of the low-band signal of input signal 130, the first reflection coefficient, the ratio of the energy of the summation that adaptive codebook contributes in the low-band excitation and adaptive codebook and fixed codebook contribute in the low-band excitation, the pitch gain of the low-band signal of input signal 130 or its combination. Alternatively, if the voicing value of frame satisfies the second threshold value (for example, very low threshold value), then the voicing factor generator 208 can be classified as unvoiced by the frame. In a particular embodiment, the voicing factor 236 can correspond to the voicing classification 180 of Fig. 1.

激励信号产生器222可自低频带合成器204接收低频带激励信号244及谐波性参数246，且可自浊音因数产生器208接收浊音因数236。激励信号产生器222可基于低频带激励信号244、谐波性参数246及浊音因数236生成高频带激励信号186，如参考图1及4至7所描述。举例来说，包络调整器162可基于浊音因数236控制低频带激励信号244的包络的量，如参考图1及4至7所描述。在特定实施例中，信号包络182可对应于包络的受控量。包络调整器162可将信号包络182提供至调制器164。Excitation signal generator 222 can receive low-band excitation signal 244 and harmonic parameter 246 from low-band synthesizer 204, and can receive voiced sound factor 236 from voiced sound factor producer 208. Excitation signal generator 222 can generate high-band excitation signal 186 based on low-band excitation signal 244, harmonic parameter 246 and voiced sound factor 236, as described with reference to Figure 1 and 4 to 7. For example, envelope adjuster 162 can be based on the amount of the envelope of voiced sound factor 236 control low-band excitation signal 244, as described with reference to Figure 1 and 4 to 7. In a particular embodiment, signal envelope 182 can correspond to the controlled amount of envelope. Envelope adjuster 162 can provide signal envelope 182 to modulator 164.

调制器164可使用信号包络182调制白噪声156以生成经调制的白噪声184，如参考图1及4至7所描述。调制器164可将经调制的白噪声184提供至输出电路166。Modulator 164 may modulate white noise 156 using signal envelope 182 to generate modulated white noise 184 , as described with reference to FIGS. 1 and 4 through 7 . Modulator 164 may provide modulated white noise 184 to output circuit 166 .

输出电路166可通过组合经调制的白噪声184及另一信号来生成高频带激励信号186，如参考图1及4至7所描述。在特定实施例中，输出电路166可基于谐波性参数246组合经调制白噪声184及另一信号，如参考图4至7所描述。The output circuit 166 can generate a high-frequency band excitation signal 186 by combining the modulated white noise 184 and another signal, as described with reference to Figures 1 and 4 to 7. In a particular embodiment, the output circuit 166 can combine the modulated white noise 184 and another signal based on the harmonicity parameter 246, as described with reference to Figures 4 to 7.

输出电路166可将高频带激励信号186提供至高频带合成器168。高频带合成器168可基于高频带激励信号186及位流的高频带部分218将合成高频带信号188提供至MUX170。举例来说，高频带合成器168可自位流的高频带部分218提取输入信号130的高频带参数。高频带合成器168可使用高频带参数及高频带激励信号186来基于特定高频带模型生成合成的高频带信号188。在特定实施例中，MUX 170可组合合成低频带信号234及合成高频带信号188来生成输出信号116。Output circuit 166 can be provided to high-frequency band synthesizer 168 with high-frequency band excitation signal 186.High-frequency band synthesizer 168 can be provided to MUX 170 with synthetic high-frequency band signal 188 based on the high-frequency band portion 218 of high-frequency band excitation signal 186 and bit stream.For example, high-frequency band synthesizer 168 can extract the high-frequency band parameters of input signal 130 from the high-frequency band portion 218 of bit stream.High-frequency band synthesizer 168 can use high-frequency band parameters and high-frequency band excitation signal 186 to generate synthetic high-frequency band signal 188 based on specific high-frequency band model.In a particular embodiment, MUX 170 can combine synthetic low-frequency band signal 234 and synthetic high-frequency band signal 188 to generate output signal 116.

因此，当合成音频信号对应于清音(或强清音)输入信号时，图2的解码器200可使得能够生成“平滑”发声合成信号。可使用基于输入信号的浊音分类而调制的噪声信号生成合成的高频带信号。相比在输入信号为强清音时，经调制的噪声信号在输入信号为强浊音时可更密切地对应于输入信号。在特定实施例中，当输入信号为强清音时，合成高频带信号可具有降低的稀疏性或不具有稀疏性，从而产生更平滑(例如，具有较少伪影)的合成音频信号。另外，基于先前浊音决策确定浊音分类(或浊音因数)可减轻帧的错分类的效应且可产生浊音频框与清音频框之间的更平滑转变。Therefore, when the synthesized audio signal corresponds to an unvoiced (or strongly unvoiced) input signal, the decoder 200 of FIG. 2 can enable the generation of a "smooth" voicing synthesized signal. A noise signal modulated based on the voiced sound classification of the input signal can be used to generate a synthesized high-frequency band signal. Compared to when the input signal is a strongly unvoiced sound, the modulated noise signal can more closely correspond to the input signal when the input signal is a strongly voiced sound. In a particular embodiment, when the input signal is a strongly unvoiced sound, the synthesized high-frequency band signal can have a reduced sparsity or do not have sparsity, thereby producing a smoother (e.g., with fewer artifacts) synthesized audio signal. In addition, determining the voiced sound classification (or voiced sound factor) based on previous voiced sound decisions can alleviate the effect of the misclassification of frames and can produce a smoother transition between voiced audio frames and unvoiced audio frames.

参考图3，揭示可操作以执行高频带激励信号生成的编码器的特定实施例，且通常将其指定为300。在特定实施例中，编码器300可对应于或包含于图1的系统100中。举例来说，编码器300可包含于第一装置102、移动装置104或所述两者中。编码器300可说明在发射装置(例如，移动装置104)处的音频信号的编码。With reference to figure 3, disclose the particular embodiment of the encoder that can operate to carry out high frequency band excitation signal generation, and it is designated as 300 usually.In a particular embodiment, encoder 300 can correspond to or be included in the system 100 of Fig. 1.For example, encoder 300 can be included in first device 102, mobile device 104 or described in the two.Encoder 300 can illustrate the coding of the audio signal at transmitting device (for example, mobile device 104) place.

编码器300包含耦合至低频带编码器304、浊音因数产生器208及高频带编码器172的滤波器组302。低频带编码器304可耦合至MUX 174。低频带编码器304及浊音因数产生器208可经由激励信号产生器222耦合至高频带编码器172。高频带编码器172可耦合至MUX174。The encoder 300 includes a filter bank 302 coupled to a low-band encoder 304, a voice factor generator 208, and a high-band encoder 172. The low-band encoder 304 may be coupled to a MUX 174. The low-band encoder 304 and the voice factor generator 208 may be coupled to the high-band encoder 172 via an excitation signal generator 222. The high-band encoder 172 may be coupled to a MUX 174.

在操作期间，滤波器组302可接收输入信号130。举例来说，输入信号130可经由麦克风146由图1的移动装置104接收。滤波器组302可将输入信号130分离成包含低频带信号334及高频带信号340的多个信号。举例来说，滤波器组302可使用对应于输入信号130的较低频率子频带(例如，50Hz至7kHz)的低通滤波器生成低频带信号334且可使用对应于输入信号130的较高频率子频带(例如，7kHz至16kHz)的高通滤波器生成高频带信号340。滤波器组302可将低频带信号334提供至低频带编码器304且可将高频带信号340提供至高频带编码器172。During operation, bank of filters 302 can receive input signal 130. For example, input signal 130 can be received by the mobile device 104 of Fig. 1 via microphone 146. Bank of filters 302 can separate input signal 130 into a plurality of signals comprising low-band signal 334 and high-band signal 340. For example, bank of filters 302 can use the low-pass filter corresponding to the lower frequency sub-band (for example, 50Hz to 7kHz) of input signal 130 to generate low-band signal 334 and can use the high-pass filter corresponding to the higher frequency sub-band (for example, 7kHz to 16kHz) of input signal 130 to generate high-band signal 340. Bank of filters 302 can provide low-band signal 334 to low-band encoder 304 and can provide high-band signal 340 to high-band encoder 172.

低频带编码器304可基于低频带信号334生成参数242(例如，低频带参数信息)及低频带激励信号244。举例来说，参数242可包含低频带LPC系数、低频带LSF、低频带线谱对(LSP)或其组合。低频带激励信号244可对应于低频带残余信号。低频带编码器304可基于特定低频带模型(例如，特定线性预测模型)生成参数242及低频带激励信号244。举例来说，低频带编码器304可生成低频带信号334的参数242(例如，对应于共振峰的滤波器系数)，可基于参数242对低频带信号334进行反向滤波，及可自低频带信号334减去所述反向滤波的信号来生成低频带激励信号244(例如，低频带信号334的低频带残余信号)。低频带编码器304可生成包含参数242及低频带激励信号244的低频带位流342。在特定实施例中，低频带位流342可包含谐波性参数246。举例来说，低频带编码器304可确定谐波性参数246，如参考图2的低频带合成器204所描述。Low band encoder 304 can for example generate parameters 242 based on low band signals 334 (low band parameter information) and low band excitation signal 244.For example, parameter 242 can comprise low band LPC coefficient, low band LSF, low band line spectrum to (LSP) or its combination.Low band excitation signal 244 can correspond to low band residual signal.Low band encoder 304 can for example generate parameters 242 and low band excitation signal 244 based on specific low band model (specific linear prediction model).For example, low band encoder 304 can for example generate parameters 242 of low band signals 334 (corresponding to the filter coefficient of formant), can carry out back filtering based on parameter 242 pairs of low band signals 334, and can generate low band excitation signal 244 (for example, the low band residual signal of low band signals 334) from the signal of described back filtering of low band signals 334. The low-band encoder 304 can generate a low-band bit stream 342 that includes parameters 242 and a low-band excitation signal 244. In a particular embodiment, the low-band bit stream 342 can include harmonicity parameters 246. For example, the low-band encoder 304 can determine the harmonicity parameters 246, as described with reference to the low-band synthesizer 204 of Figure 2.

低频带编码器304可将参数242提供至浊音因数产生器208且可将低频带激励信号244及谐波性参数246提供至激励信号产生器222。浊音因数产生器208可基于参数242确定浊音因数236，如参考图2所描述。激励信号产生器222可基于低频带激励信号244、谐波性参数246及浊音因数236确定高频带激励信号186，如参考图2及4至7所描述。The low-band encoder 304 can provide the parameters 242 to the voiced sound factor generator 208 and can provide the low-band excitation signal 244 and the harmonic parameters 246 to the excitation signal generator 222. The voiced sound factor generator 208 can determine the voiced sound factor 236 based on the parameters 242, as described with reference to Figure 2. The excitation signal generator 222 can determine the high-band excitation signal 186 based on the low-band excitation signal 244, the harmonic parameters 246 and the voiced sound factor 236, as described with reference to Figures 2 and 4 to 7.

激励信号产生器222可将高频带激励信号186提供至高频带编码器172。高频带编码器172可基于高频带信号340及高频带激励信号186生成高频带位流190，如参考图1所描述。高频带编码器172可将高频带位流190提供至MUX 174。MUX 174可组合低频带位流342与高频带位流190来生成位流132。Excitation signal generator 222 can provide high-frequency band excitation signal 186 to high-frequency band encoder 172.High-frequency band encoder 172 can generate high-frequency band bit stream 190 based on high-frequency band signal 340 and high-frequency band excitation signal 186, as described with reference to Figure 1.High-frequency band encoder 172 can provide high-frequency band bit stream 190 to MUX 174.MUX 174 can combine low-frequency band bit stream 342 and high-frequency band bit stream 190 to generate bit stream 132.

因此，编码器300可使得能够模拟在接收装置处的解码器，所述解码器使用基于输入信号的浊音分类而调制的噪声信号来生成合成音频信号。编码器300可生成高频带参数(例如，增益值)，所述参数用于生成极其近似输入信号130的合成音频信号。Therefore, the encoder 300 can make it possible to simulate a decoder at a receiving device that generates a synthetic audio signal using a noise signal modulated based on the voiced sound classification of the input signal. The encoder 300 can generate high-frequency band parameters (e.g., gain values) that are used to generate a synthetic audio signal that closely approximates the input signal 130.

图4至7为说明高频带激励信号生成的方法的特定实施例的图。可由图1至3的系统100至300的一或多个组件执行图4至7的方法中的每一者。举例来说，可由图1的高频带激励信号生成模块122的一或多个组件、图2和/或图3的激励信号产生器222、图2的浊音因数产生器208或其组合执行图4至7的方法中的每一者。图4至7说明生成在变换域中、在时域中或在变换域抑或时域中表示的高频带激励信号的方法的替代实施例。Fig. 4 to 7 is the figure of the particular embodiment of the method that the high frequency band excitation signal generates for explanation.Can be by one or more components execution Figure 4 to 7 method for the system 100 to 300 of Fig. 1 to 3 each.For example, can be by one or more components of the high frequency band excitation signal generation module 122 of Fig. 1, the excitation signal producer 222 of Fig. 2 and/or Fig. 3, the voiced sound factor producer 208 of Fig. 2 or its combination execution Figure 4 to 7 method each.Fig. 4 to 7 explanation is generated in the transform domain, in the time domain or in the transform domain or the alternate embodiment of the method for the high frequency band excitation signal that represents in the time domain.

参考图4，展示高频带激励信号生成的方法的特定实施例的图，且通常将其指定为400。方法400可对应于生成在变换域或时域中表示的高频带激励信号。4, a diagram of a particular embodiment of a method of high-band excitation signal generation is shown and generally designated 400. Method 400 may correspond to generating a high-band excitation signal represented in a transform domain or a time domain.

方法400包含在404处确定浊音因数。举例来说，图2的浊音因数产生器208可基于代表性信号422确定浊音因数236。在特定实施例中，浊音因数产生器208可基于一或多个其它信号参数确定浊音因数236。在特定实施例中，若干信号参数可组合起作用来确定浊音因数236。举例来说，浊音因数产生器208可基于位流的低频带部分232(或图3的低频带信号334)、参数242、先前浊音决策、一或多个其它因数或其组合来确定浊音因数236，如参考图2至3所描述。代表性信号422可包含位流的低频带部分232、低频带信号334或通过扩展低频带激励信号244生成的扩展信号。可在变换(例如，频率)域或时域中表示代表性信号422。举例来说，激励信号生成模块122可通过对输入信号130、图1的位流132、位流的低频带部分232、低频带信号334、通过扩展图2的低频带激励信号244生成的扩展信号或其组合应用变换(例如，傅立叶变换)来生成代表性信号422。Method 400 is included in and determines voiced sound factor at 404 places.For example, the voiced sound factor producer 208 of Fig. 2 can determine voiced sound factor 236 based on representative signal 422.In a particular embodiment, voiced sound factor producer 208 can determine voiced sound factor 236 based on one or more other signal parameters.In a particular embodiment, some signal parameters can be combined to work and determine voiced sound factor 236.For example, voiced sound factor producer 208 can determine voiced sound factor 236 based on low-band part 232 (or low-band signal 334 of Fig. 3), parameter 242, previous voiced sound decision, one or more other factors or its combination of bit stream, as described with reference to Figures 2 to 3. Representative signal 422 can comprise low-band part 232, low-band signal 334 of bit stream or the extended signal generated by extended low-band excitation signal 244.Representative signal 422 can be represented in transform (for example, frequency) domain or time domain. For example, the excitation signal generation module 122 can generate the representative signal 422 by applying a transform (e.g., a Fourier transform) to the input signal 130, the bit stream 132 of Figure 1, the low-band portion 232 of the bit stream, the low-band signal 334, an extended signal generated by extending the low-band excitation signal 244 of Figure 2, or a combination thereof.

方法400也包含在408处计算低通滤波器(LPF)截止频率，及在410处控制信号包络的量。举例来说，图1的包络调整器162可基于浊音因数236计算LPF截止频率426。如果浊音因数236指示强浊音音频，则LPF截止频率426可较高，指示时间包络的谐波分量的较高影响。当浊音因数236指示强清音音频时，LPF截止频率426可较低，对应于时间包络的谐波分量的较低(或无)影响。Method 400 also includes calculating a low-pass filter (LPF) cutoff frequency at 408 and controlling the amount of signal envelope at 410. For example, envelope adjuster 162 of FIG. 1 may calculate LPF cutoff frequency 426 based on voice factor 236. If voice factor 236 indicates strongly voiced audio, LPF cutoff frequency 426 may be higher, indicating a higher influence of harmonic components of the temporal envelope. When voice factor 236 indicates strongly unvoiced audio, LPF cutoff frequency 426 may be lower, corresponding to a lower (or no) influence of harmonic components of the temporal envelope.

包络调整器162可通过控制信号包络182的特性(例如，频率范围)来控制信号包络182的量。举例来说，包络调整器162可通过将低通滤波器450应用于代表性信号422来控制信号包络182的特性。低通滤波器450的截止频率可基本上等于LPF截止频率426。包络调整器162可通过基于LPF截止频率426追踪代表性信号422的时间包络来控制信号包络182的频率范围。举例来说，低通滤波器450可对代表性信号422进行滤波，使得经滤波信号具有由LPF截止频率426定义的频率范围。为了说明，经滤波信号的频率范围可低于LPF截止频率426。在特定实施例中，经滤波信号可具有与低于LPF截止频率426的代表性信号422的振幅匹配的振幅且可具有高于LPF截止频率426的低振幅(例如，基本上等于0)。Envelope adjuster 162 can control the magnitude of signal envelope 182 by controlling characteristics (e.g., frequency range) of signal envelope 182. For example, envelope adjuster 162 can control the characteristics of signal envelope 182 by applying a lowpass filter 450 to representative signal 422. The cutoff frequency of lowpass filter 450 can be substantially equal to LPF cutoff frequency 426. Envelope adjuster 162 can control the frequency range of signal envelope 182 by tracking the temporal envelope of representative signal 422 based on LPF cutoff frequency 426. For example, lowpass filter 450 can filter representative signal 422 such that the filtered signal has a frequency range defined by LPF cutoff frequency 426. For illustration, the frequency range of the filtered signal can be below LPF cutoff frequency 426. In a particular embodiment, the filtered signal can have an amplitude that matches the amplitude of representative signal 422 below LPF cutoff frequency 426 and can have a low amplitude (e.g., substantially equal to zero) above LPF cutoff frequency 426.

曲线图470说明原始频谱形状482。原始频谱形状482可表示代表性信号422的信号包络182。第一频谱形状484可对应于通过将具有LPF截止频率426的滤波器应用于代表性信号422而生成的经滤波信号。Graph 470 illustrates an original spectral shape 482. Original spectral shape 482 may represent signal envelope 182 of representative signal 422. First spectral shape 484 may correspond to a filtered signal generated by applying a filter having LPF cutoff frequency 426 to representative signal 422.

LPF截止频率426可确定追踪速度。举例来说，相比在浊音因数236指示清音时，在浊音因数236指示浊音时可更快地追踪(例如，更频繁地更新)时间包络。在特定实施例中，包络调整器162可控制时域中的信号包络182的特性。举例来说，包络调整器162可逐个样本控制信号包络182的特性。在替代实施例中，包络调整器162可控制在变换域中表示的信号包络182的特性。举例来说，包络调整器162可通过基于追踪速度追踪频谱形状来控制信号包络182的特性。包络调整器162可将信号包络182提供至图1的调制器164。The LPF cutoff frequency 426 may determine the tracking speed. For example, the time envelope may be tracked faster (e.g., updated more frequently) when the voiced factor 236 indicates voiced speech than when the voiced factor 236 indicates unvoiced speech. In certain embodiments, the envelope adjuster 162 may control the characteristics of the signal envelope 182 in the time domain. For example, the envelope adjuster 162 may control the characteristics of the signal envelope 182 on a sample-by-sample basis. In alternative embodiments, the envelope adjuster 162 may control the characteristics of the signal envelope 182 represented in the transform domain. For example, the envelope adjuster 162 may control the characteristics of the signal envelope 182 by tracking the spectral shape based on the tracking speed. The envelope adjuster 162 may provide the signal envelope 182 to the modulator 164 of FIG. 1 .

方法400进一步包含在412处将信号包络182与白噪声156相乘。举例来说，图1的调制器164可使用信号包络182来调制白噪声156以生成经调制的白噪声184。信号包络182可调制在变换域或时域中表示的白噪声156。Method 400 further includes multiplying signal envelope 182 with white noise 156 at 412. For example, modulator 164 of FIG1 may modulate white noise 156 using signal envelope 182 to generate modulated white noise 184. Signal envelope 182 may modulate white noise 156 represented in a transform domain or a time domain.

方法400也包含在406处决定混合。举例来说，图1的调制器164可基于谐波性参数246及浊音因数236确定待应用于经调制白噪声184的第一增益(例如，噪声增益434)及待应用于代表性信号422的第二增益(例如，谐波增益436)。举例来说，可计算噪声增益434(例如，介于0与1之间)及谐波增益436来匹配由谐波性参数246所指示的谐波与噪声能量的比率。调制器164在浊音因数236指示强清音时可增大噪声增益434且在浊音因数236指示强浊音时可减小噪声增益434。在特定实施例中，调制器164可基于噪声增益434确定谐波增益436。在特定实施例中，谐波增益Method 400 also includes determining mixing at 406. For example, the modulator 164 of FIG. 1 may determine a first gain (e.g., noise gain 434) to be applied to the modulated white noise 184 and a second gain (e.g., harmonic gain 436) to be applied to the representative signal 422 based on the harmonicity parameter 246 and the voicing factor 236. For example, the noise gain 434 (e.g., between 0 and 1) and the harmonic gain 436 may be calculated to match the ratio of harmonic to noise energy indicated by the harmonicity parameter 246. The modulator 164 may increase the noise gain 434 when the voicing factor 236 indicates strong unvoiced speech and may decrease the noise gain 434 when the voicing factor 236 indicates strong voiced speech. In a particular embodiment, the modulator 164 may determine the harmonic gain 436 based on the noise gain 434. In a particular embodiment, the harmonic gain

方法400进一步包含在414处将经调制白噪声184及噪声增益434相乘。举例来说，图1的输出电路166可通过将噪声增益434应用于对经调制的白噪声184来生成经缩放的经调制白噪声438。The method 400 further includes multiplying the modulated white noise 184 and the noise gain 434 at 414. For example, the output circuit 166 of FIG.

方法400也包含在416处将代表性信号422及谐波增益436相乘。举例来说，图1的输出电路166可通过将谐波增益436应用于代表性信号422来生成经缩放的代表性信号440。The method 400 also includes multiplying the representative signal 422 and the harmonic gain 436 at 416. For example, the output circuit 166 of FIG.

方法400进一步包含在418处将经缩放的经调制白噪声438与经缩放的代表性信号440相加。举例来说，图1的输出电路166可通过将经缩放的经调制白噪声438与经缩放的代表性信号440组合(例如，相加)来生成高频带激励信号186。在替代实施例中，可由图1的调制器164执行操作414、操作416或所述两者。高频带激励信号186可在变换域或时域中。Method 400 is further included in 418 places and will be added through the modulated white noise 438 of scaling and the representative signal 440 of scaling.For example, the output circuit 166 of Fig. 1 can be by will be generated high frequency band excitation signal 186 through the modulated white noise 438 of scaling and the representative signal 440 of scaling (for example, adding).In an alternate embodiment, can be by the modulator 164 execution operation 414, operation 416 or described both of Fig. 1.High frequency band excitation signal 186 can be in transform domain or time domain.

因此，方法400可使得信号包络的量能够通过基于浊音因数236控制包络的特性来控制。在特定实施例中，可基于谐波性参数246通过增益因数(例如，噪声增益434及谐波增益436)动态地确定经调制白噪声184及代表性信号422的比例。可缩放经调制的白噪声184及代表性信号422，使得高频带激励信号186的谐波与噪声能量的比率近似输入信号130的高频带信号的谐波与噪声能量的比率。In certain embodiments, the method 400 can be used to determine the ratio of modulated white noise 184 and representative signal 422 by gain factor (for example, noise gain 434 and harmonic gain 436) based on harmonic parameter 246. Scalable modulated white noise 184 and representative signal 422 make the harmonic waves of the high-frequency band excitation signal 186 and the ratio of the harmonic waves of the high-frequency band signal of the approximate input signal 130 of the noise energy and the noise energy.

在特定实施例中，可经由处理单元(例如中央处理单元(CPU)、数字信号处理器(DSP)或控制器)的硬件(例如，现场可编程门阵列(FPGA)装置、专用集成电路(ASIC)等)、经由固件装置或其任何组合来实施图4的方法400。作为一实例，可由执行指令的处理器(如关于图9所描述)执行图4的方法400。In certain embodiments, the method 400 of FIG4 may be implemented via hardware (e.g., a field programmable gate array (FPGA) device, an application specific integrated circuit (ASIC), etc.) of a processing unit (e.g., a central processing unit (CPU), a digital signal processor (DSP), or a controller), via firmware devices, or any combination thereof. As an example, the method 400 of FIG4 may be performed by a processor executing instructions (as described with respect to FIG9 ).

参考图5，展示高频带激励信号生成的方法的特定实施例的图，且通常将其指定为500。方法500可包含通过控制在变换域中表示的信号包络的量、调制在变换域中表示的白噪声或所述两者来生成高频带激励信号。With reference to figure 5, show the figure of the particular embodiment of the method that high-frequency band excitation signal generates, and it is designated as 500 usually.Method 500 can comprise the amount of the signal envelope that represents in transform domain by control, modulation white noise that represents in transform domain or described two generate high-frequency band excitation signal.

方法500包含方法400的操作404、406、412及414。可在变换(例如，频率)域中表示代表性信号422，如参考图4所描述。Method 500 includes operations 404, 406, 412, and 414 of method 400. Representative signal 422 may be represented in a transform (eg, frequency) domain, as described with reference to FIG.

方法500也包含在508处计算带宽扩张因数。举例来说，图1的包络调整器162可基于浊音因数236确定带宽扩张因数526。举例来说，相比在浊音因数236指示强清音时，带宽扩张因数526在浊音因数236指示强浊音时可指示更大带宽扩张。The method 500 also includes calculating a bandwidth expansion factor at 508. For example, the envelope adjuster 162 of FIG1 can determine the bandwidth expansion factor 526 based on the voiced speech factor 236. For example, the bandwidth expansion factor 526 can indicate a greater bandwidth expansion when the voiced speech factor 236 indicates strong voiced speech than when the voiced speech factor 236 indicates strong unvoiced speech.

方法500进一步包含在510处通过调整高频带LPC极点生成频谱。举例来说，包络调整器162可确定与代表性信号422相关联的LPC极点。包络调整器162可通过控制信号包络182的量值、信号包络182的形状、信号包络182的增益或其组合来控制信号包络182的特性。举例来说，包络调整器162可通过基于带宽扩张因数526调整LPC极点来控制信号包络182的量值、信号包络182的形状、信号包络182的增益或其组合。在特定实施例中，可在变换域中调整LPC极点。包络调整器162可基于经调整LPC极点生成频谱。Method 500 further includes, at 510, generating a spectrum by adjusting high-band LPC poles. For example, envelope adjuster 162 may determine LPC poles associated with representative signal 422. Envelope adjuster 162 may control characteristics of signal envelope 182 by controlling the magnitude of signal envelope 182, the shape of signal envelope 182, the gain of signal envelope 182, or a combination thereof. For example, envelope adjuster 162 may control the magnitude of signal envelope 182, the shape of signal envelope 182, the gain of signal envelope 182, or a combination thereof by adjusting the LPC poles based on bandwidth expansion factor 526. In a particular embodiment, the LPC poles may be adjusted in the transform domain. Envelope adjuster 162 may generate a spectrum based on the adjusted LPC poles.

曲线图570说明原始频谱形状582。原始频谱形状582可表示代表性信号422的信号包络182。可基于与代表性信号422相关联的LPC极点生成原始频谱形状582。包络调整器162可基于浊音因数236调整LPC极点。包络调整器162可将对应于经调整LPC极点的滤波器应用于代表性信号422来生成具有第一频谱形状584或第二频谱形状586的经滤波信号。当浊音因数236指示强浊音时，经滤波信号的第一频谱形状584可对应于经调整LPC极点。当浊音因数236指示强清音时，经滤波信号的第二频谱形状586可对应于经调整LPC极点。Graph 570 illustrates an original spectral shape 582. Original spectral shape 582 may represent signal envelope 182 of representative signal 422. Original spectral shape 582 may be generated based on LPC poles associated with representative signal 422. Envelope adjuster 162 may adjust the LPC poles based on voiced speech factor 236. Envelope adjuster 162 may apply a filter corresponding to the adjusted LPC poles to representative signal 422 to generate a filtered signal having a first spectral shape 584 or a second spectral shape 586. When voiced speech factor 236 indicates strong voiced speech, first spectral shape 584 of the filtered signal may correspond to the adjusted LPC poles. When voiced speech factor 236 indicates strong unvoiced speech, second spectral shape 586 of the filtered signal may correspond to the adjusted LPC poles.

信号包络182可对应于所生成频谱、经调整LPC极点、与具有经调整LPC极点的代表性信号422相关联的LPC系数或其组合。包络调整器162可将信号包络182提供至图1的调制器164。The signal envelope 182 may correspond to the generated spectrum, the adjusted LPC poles, the LPC coefficients associated with the representative signal 422 having the adjusted LPC poles, or a combination thereof. The envelope adjuster 162 may provide the signal envelope 182 to the modulator 164 of FIG.

调制器164可使用信号包络182调制白噪声156来生成经调制白噪声184，如参考方法400的操作412所描述。调制器164可调制在变换域中表示的白噪声156。图1的输出电路166可基于经调制的白噪声184及噪声增益434生成经缩放的经调制白噪声438，如参考方法400的操作414所描述。The modulator 164 may modulate the white noise 156 using the signal envelope 182 to generate modulated white noise 184, as described with reference to operation 412 of the method 400. The modulator 164 may modulate the white noise 156 represented in the transform domain. The output circuit 166 of FIG. 1 may generate scaled modulated white noise 438 based on the modulated white noise 184 and the noise gain 434, as described with reference to operation 414 of the method 400.

方法500也包含在512处将高频带LPC频谱542及代表性信号422相乘。举例来说，图1的输出电路166可使用高频带LPC频谱542对代表性信号422进行滤波来生成经滤波信号544。在特定实施例中，输出电路166可基于与代表性信号422相关联的高频带参数(例如，高频带LPC系数)来确定高频带LPC频谱542。为了说明，输出电路166可基于图2的位流的高频带部分218或基于自图3的高频带信号340生成的高频带参数信息来确定高频带LPC频谱542。Method 500 is also included in 512 places and high-frequency band LPC spectrum 542 and representative signal 422 are multiplied one another.For example, the output circuit 166 of Fig. 1 can use high-frequency band LPC spectrum 542 to filter representative signal 422 and generate filtered signal 544.In a particular embodiment, output circuit 166 can determine high-frequency band LPC spectrum 542 based on the high-frequency band parameter (for example, high-frequency band LPC coefficient) associated with representative signal 422.For illustration, output circuit 166 can determine high-frequency band LPC spectrum 542 based on the high-frequency band part 218 of the bit stream of Fig. 2 or based on the high-frequency band parameter information generated from the high-frequency band signal 340 of Fig. 3.

代表性信号422可对应于自图2的低频带激励信号244生成的扩展信号。输出电路166可使用高频带LPC频谱542合成扩展信号来生成经滤波信号544。合成可在变换域中进行。举例来说，输出电路166可使用频域中的倍增执行合成。Representative signal 422 can correspond to the extended signal that generates from the low-frequency band excitation signal 244 of Fig. 2. Output circuit 166 can use high-frequency band LPC spectrum 542 synthetic extended signals to generate through filtered signal 544.Synthesis can be carried out in transform domain.For example, output circuit 166 can use the multiplication in frequency domain to perform synthesis.

方法500进一步包含在516处将经滤波信号544及谐波增益436相乘。举例来说，图1的输出电路166可将经滤波信号544与谐波增益436相乘来生成经缩放的经滤波信号540。在特定实施例中，可由图1的调制器164执行操作512、操作516或所述两者。The method 500 further includes multiplying the filtered signal 544 and the harmonic gain 436 at 516. For example, the output circuit 166 of FIG1 may multiply the filtered signal 544 and the harmonic gain 436 to generate the scaled filtered signal 540. In a particular embodiment, operation 512, operation 516, or both may be performed by the modulator 164 of FIG1.

方法500也包含在518处将经缩放的经调制白噪声438与经缩放的经滤波信号540相加。举例来说，图1的输出电路166可组合经缩放的经调制白噪声438及经缩放的经滤波信号540来生成高频带激励信号186。可在变换域中表示高频带激励信号186。Method 500 also is included in 518 places and will be added through the modulated white noise 438 of scaling and the filtered signal 540 through scaling.For example, the output circuit 166 of Fig. 1 can combine through the modulated white noise 438 of scaling and the filtered signal 540 through scaling to generate high-frequency band excitation signal 186.Can represent high-frequency band excitation signal 186 in transform domain.

因此，方法500可使得信号包络的量能够通过基于浊音因数236在变换域中调整高频带LPC极点而控制。在特定实施例中，可基于谐波性参数246通过增益(例如，噪声增益434及谐波增益436)动态地确定经调制白噪声184与经滤波信号544的比例。可缩放经调制的白噪声184及经滤波信号544，使得高频带激励信号186的谐波与噪声能量的比率近似输入信号130的高频带信号的谐波与噪声能量的比率。In a specific embodiment, can dynamically determine the ratio of modulated white noise 184 and filtered signal 544 by gain (for example, noise gain 434 and harmonic gain 436) based on harmonic parameter 246. Scalable modulated white noise 184 and filtered signal 544 make the harmonic waves of the high band signal of the approximate input signal 130 of the ratio of the harmonic waves and the noise energy of high band excitation signal 186 and the noise energy.

在特定实施例中，可经由处理单元(例如中央处理单元(CPU)、数字信号处理器(DSP)或控制器)的硬件(例如，现场可编程门阵列(FPGA)装置、专用集成电路(ASIC)等)、经由固件装置或其任何组合来实施图5的方法500。作为一实例，可由执行指令的处理器(如关于图9所描述)执行图5的方法500。In certain embodiments, the method 500 of FIG5 may be implemented via hardware (e.g., a field programmable gate array (FPGA) device, an application specific integrated circuit (ASIC), etc.) of a processing unit (e.g., a central processing unit (CPU), a digital signal processor (DSP), or a controller), via firmware devices, or any combination thereof. As an example, the method 500 of FIG5 may be performed by a processor executing instructions (as described with respect to FIG9).

参考图6，展示高频带激励信号生成的方法的特定实施例的图，且通常将其指定为600。方法600可包含通过控制时域中的信号包络的量来生成高频带激励信号。6, a diagram of a particular embodiment of a method of high-band excitation signal generation is shown and generally designated 600. Method 600 may include generating a high-band excitation signal by controlling the amount of a signal envelope in the time domain.

方法600包含方法400的操作404、406及414及方法500的操作508。代表性信号422及白噪声156可在时域中。Method 600 includes operations 404, 406, and 414 of method 400 and operation 508 of method 500. Representative signal 422 and white noise 156 may be in the time domain.

方法600也包含在610处执行LPC合成。举例来说，图1的包络调整器162可通过基于带宽扩张因数526调整滤波器的系数来控制信号包络182的特性(例如，形状、量值和/或增益)。在特定实施例中，可在时域中执行LPC合成。滤波器的系数可对应于高频带LPC系数。LPC滤波器系数可表示频谱峰值。通过调整LPC滤波器系数控制频谱峰值可使得能够基于浊音因数236控制白噪声156的调制的程度。Method 600 also includes performing LPC synthesis at 610. For example, envelope adjuster 162 of FIG. 1 can control characteristics (e.g., shape, magnitude, and/or gain) of signal envelope 182 by adjusting filter coefficients based on bandwidth expansion factor 526. In certain embodiments, LPC synthesis can be performed in the time domain. The filter coefficients can correspond to high-band LPC coefficients. The LPC filter coefficients can represent spectral peaks. Controlling spectral peaks by adjusting the LPC filter coefficients can enable control of the degree of modulation of white noise 156 based on voicedness factor 236.

举例来说，当浊音因数236指示浊音话音时，可保持频谱峰值。作为另一实例，当浊音因数236指示清音话音时可平滑化频谱峰值，同时保持整体频谱形状。For example, when the voiced factor 236 indicates voiced speech, the spectral peaks may be preserved. As another example, when the voiced factor 236 indicates unvoiced speech, the spectral peaks may be smoothed while preserving the overall spectral shape.

曲线图670说明原始频谱形状682。原始频谱形状682可表示代表性信号422的信号包络182。可基于与代表性信号422相关联的LPC滤波器系数生成原始频谱形状682。包络调整器162可基于浊音因数236调整LPC滤波器系数。包络调整器162可将对应于经调整LPC滤波器系数的滤波器应用于代表性信号422来生成具有第一频谱形状684或第二频谱形状686的经滤波信号。当浊音因数236指示强浊音时，经滤波信号的第一频谱形状684可对应于经调整LPC滤波器系数。当浊音因数236指示强浊音时，可保持频谱峰值，如通过第一频谱形状684所说明。当浊音因数236指示强清音时，第二频谱形状686可对应于经调整的LPC滤波器系数。当浊音因数236指示强清音时，可保持整体频谱形状，同时可平滑化频谱峰值，如通过第二频谱形状686所说明。信号包络182可对应于经调整滤波器系数。包络调整器162可将信号包络182提供至图1的调制器164。Graph 670 illustrates an original spectral shape 682. Original spectral shape 682 may represent signal envelope 182 of representative signal 422. Original spectral shape 682 may be generated based on LPC filter coefficients associated with representative signal 422. Envelope adjuster 162 may adjust the LPC filter coefficients based on voice factor 236. Envelope adjuster 162 may apply a filter corresponding to the adjusted LPC filter coefficients to representative signal 422 to generate a filtered signal having a first spectral shape 684 or a second spectral shape 686. When voice factor 236 indicates strong voiced speech, first spectral shape 684 of the filtered signal may correspond to the adjusted LPC filter coefficients. When voice factor 236 indicates strong voiced speech, spectral peaks may be preserved, as illustrated by first spectral shape 684. When voice factor 236 indicates strong unvoiced speech, second spectral shape 686 may correspond to the adjusted LPC filter coefficients. When the voiced factor 236 indicates strong unvoiced speech, the overall spectral shape can be maintained while the spectral peaks can be smoothed, as illustrated by the second spectral shape 686. The signal envelope 182 can correspond to the adjusted filter coefficients. The envelope adjuster 162 can provide the signal envelope 182 to the modulator 164 of FIG. 1.

调制器164可使用信号包络182(例如，经调整滤波器系数)调制白噪声156以生成经调制白噪声184。举例来说，调制器164可将滤波器应用于白噪声156以生成经调制白噪声184，其中滤波器具有经调整的滤波器系数。调制器164可将经调制的白噪声184提供至图1的输出电路166。输出电路166可将经调制白噪声184与噪声增益434相乘来生成经缩放的经调制白噪声438，如参考图4的操作414所描述。The modulator 164 may modulate the white noise 156 using the signal envelope 182 (e.g., the adjusted filter coefficients) to generate a modulated white noise 184. For example, the modulator 164 may apply a filter to the white noise 156 to generate the modulated white noise 184, wherein the filter has the adjusted filter coefficients. The modulator 164 may provide the modulated white noise 184 to the output circuit 166 of FIG. 1 . The output circuit 166 may multiply the modulated white noise 184 by the noise gain 434 to generate a scaled modulated white noise 438, as described with reference to operation 414 of FIG.

方法600进一步包含在612处执行高频带LPC合成。举例来说，图1的输出电路166可合成代表性信号422来生成合成高频带信号614。可在时域中执行合成。在特定实施例中，可通过扩展低频带激励信号来生成代表性信号422。输出电路166可通过将使用高频带LPC的合成滤波器应用于代表性信号422来生成合成的高频带信号614。Method 600 is further included in 612 places and carries out high frequency band LPC and synthesizes.For example, the output circuit 166 of Fig. 1 can synthesize representative signal 422 and generate synthetic high frequency band signal 614.Can carry out synthesis in time domain.In a particular embodiment, can generate representative signal 422 by expanding low frequency band excitation signal.Output circuit 166 can be by the composite filter that will use high frequency band LPC to be applied to representative signal 422 and generate synthetic high frequency band signal 614.

方法600也包含在616处将合成的高频带信号614与谐波增益436相乘。举例来说，图1的输出电路166可将谐波增益436应用于合成的高频带信号614来生成经缩放的合成高频带信号640。在替代实施例中，图1的调制器164可执行操作612、操作616或所述两者。Method 600 also is included in 616 places and synthetic high frequency band signal 614 and harmonic gain 436 are multiplied one another.For example, the output circuit 166 of Fig. 1 can apply harmonic gain 436 to synthetic high frequency band signal 614 to generate the synthetic high frequency band signal 640 through scaling.In an alternative embodiment, the modulator 164 of Fig. 1 can perform operation 612, operation 616 or described both.

方法600进一步包含在618处将经缩放的经调制白噪声438与经缩放的合成高频带信号640相加。举例来说，图1的输出电路166可组合经缩放的经调制白噪声438及经缩放的合成高频带信号640来生成高频带激励信号186。Method 600 further includes adding the modulated white noise 438 and the synthetic high-band signal 640 through scaling at 618 places. For example, the output circuit 166 of Fig. 1 can combine the modulated white noise 438 and the synthetic high-band signal 640 through scaling to generate the high-band excitation signal 186 through scaling.

因此，方法600可使得信号包络的量能够通过基于浊音因数236调整滤波器的系数而控制。在特定实施例中，可基于浊音因数236动态地确定经调制白噪声184与合成高频带信号614的比例。可缩放经调制的白噪声184及合成的高频带信号614，使得高频带激励信号186的谐波与噪声能量的比率近似输入信号130的高频带信号的谐波与噪声能量的比率。In one embodiment, the method 600 can be used to determine the ratio of modulated white noise 184 and synthetic high-frequency band signals 614 based on the voicing factor 236. The ... so that the harmonic waves of the high-frequency band excitation signal 186 and the ratio of the noise energy are similar to the ratio of the harmonic waves of the high-frequency band signals of the input signal 130.

在特定实施例中，可经由处理单元(例如中央处理单元(CPU)、数字信号处理器(DSP)或控制器)的硬件(例如，现场可编程门阵列(FPGA)装置、专用集成电路(ASIC)等)、经由固件装置或其任何组合来实施图6的方法600。作为一实例，可由执行指令的处理器(如关于图9所描述)执行图6的方法600。In certain embodiments, the method 600 of FIG6 may be implemented via hardware (e.g., a field programmable gate array (FPGA) device, an application specific integrated circuit (ASIC), etc.) of a processing unit (e.g., a central processing unit (CPU), a digital signal processor (DSP), or a controller), via firmware devices, or any combination thereof. As an example, the method 600 of FIG6 may be performed by a processor executing instructions (as described with respect to FIG9 ).

参考图7，展示高频带激励信号生成的方法的特定实施例的图，且通常将其指定为700。方法700可对应于通过控制在时域或变换(例如，频率)域中表示的信号包络的量来生成高频带激励信号。With reference to Figure 7, show the figure of the particular embodiment of the method that high-frequency band excitation signal generates, and it is usually designated as 700.Method 700 can correspond to generating the high-frequency band excitation signal by the amount of the signal envelope that represents in time domain or transform (for example, frequency) domain by control.

方法700包含方法400的操作404、406、412、414及416。可在变换域或时域中表示代表性信号422。方法700也包含在710处确定信号包络。举例来说，图1的包络调整器162可通过将具有恒定系数的低通滤波器应用于代表性信号422来生成信号包络182。Method 700 includes operations 404, 406, 412, 414, and 416 of method 400. Representative signal 422 may be represented in a transform domain or a time domain. Method 700 also includes determining a signal envelope at 710. For example, envelope adjuster 162 of FIG. 1 may generate signal envelope 182 by applying a low-pass filter having constant coefficients to representative signal 422.

方法700也包含在702处确定均方根值。举例来说，图1的调制器164可确定信号包络182的均方根能量。The method 700 also includes determining an rms value at 702. For example, the modulator 164 of FIG.

方法700进一步包含在712处将均方根值与白噪声156相乘。举例来说，图1的输出电路166可将均方根值与白噪声156相乘以生成未经调制的白噪声736。The method 700 further includes multiplying the RMS value by the white noise 156 at 712. For example, the output circuit 166 of FIG.

图1的调制器164可将信号包络182与白噪声156相乘以生成经调制的白噪声184，如参考方法400的操作412所描述。可在变换域或时域中表示白噪声156。1 may multiply the signal envelope 182 by the white noise 156 to generate the modulated white noise 184, as described with reference to operation 412 of the method 400. The white noise 156 may be represented in a transform domain or a time domain.

方法700也包含在704处确定经调制及未经调制的白噪声的增益比例。举例来说，图1的输出电路166可基于噪声增益434及浊音因数236确定未经调制的噪声增益734及经调制的噪声增益732。如果浊音因数236指示经编码的音频信号对应于强浊音音讯，则经调制的噪声增益732可对应于较高比例的噪声增益434。如果浊音因数236指示经编码的音频信号对应于强清音音讯，则未经调制的噪声增益734可对应于较高比例的噪声增益434。Method 700 also includes determining a gain ratio of modulated and unmodulated white noise at 704. For example, output circuit 166 of FIG. 1 may determine an unmodulated noise gain 734 and a modulated noise gain 732 based on noise gain 434 and voice factor 236. If voice factor 236 indicates that the encoded audio signal corresponds to strongly voiced audio, modulated noise gain 732 may correspond to a higher ratio of noise gain 434. If voice factor 236 indicates that the encoded audio signal corresponds to strongly unvoiced audio, unmodulated noise gain 734 may correspond to a higher ratio of noise gain 434.

方法700进一步包含在714处将未经调制的噪声增益734及未经调制白噪声736相乘。举例来说，图1的输出电路166可将未经调制的噪声增益734应用于未经调制的白噪声736来生成经缩放的未经调制的白噪声742。The method 700 further includes multiplying the unmodulated noise gain 734 and the unmodulated white noise 736 at 714. For example, the output circuit 166 of FIG.

输出电路166可将经调制噪声增益732应用于经调制的白噪声184来生成经缩放的经调制白噪声740，如参考方法400的操作414所描述。Output circuit 166 may apply modulated noise gain 732 to modulated white noise 184 to generate scaled modulated white noise 740 , as described with reference to operation 414 of method 400 .

方法700也包含在716处将经缩放的未经调制的白噪声742与经缩放的白噪声744相加。举例来说，图1的输出电路166可组合经缩放的未经调制的白噪声742与经缩放的经调制白噪声740来生成经缩放的白噪声744。Method 700 also includes adding scaled unmodulated white noise 742 to scaled white noise 744 at 716. For example, output circuit 166 of FIG.

方法700进一步包含在718处将经缩放的白噪声744与经缩放的代表性信号440相加。举例来说，输出电路166可组合经缩放的白噪声744与经缩放的代表性信号440来生成高频带激励信号186。方法700可使用在变换(或时间)域中表示的代表性信号422及白噪声156生成在变换(或时间)域中表示的高频带激励信号186。Method 700 is further included in 718 places and will add the white noise 744 through scaling and the representative signal 440 through scaling.For example, output circuit 166 can combine the white noise 744 through scaling and generate high-frequency band excitation signal 186 through the representative signal 440 through scaling.Method 700 can use the representative signal 422 and the white noise 156 that represent in transform (or time) domain to generate the high-frequency band excitation signal 186 that represents in transform (or time) domain.

因此，方法700可使得未经调制的白噪声736及经调制的白噪声184的比例能够基于浊音因数236通过增益因数(例如，未经调制的噪声增益734及经调制的噪声增益732)而动态地确定。相比对应于基于经稀疏译码的低频带残余调制的白噪声的高频带信号，用于强清音音讯的高频带激励信号186可对应于具有较少伪影的未经调制的白噪声。Therefore, method 700 can make the ratio of unmodulated white noise 736 and modulated white noise 184 can be determined dynamically based on voicing factor 236 by gain factor (for example, unmodulated noise gain 734 and modulated noise gain 732).Compared to the high-frequency band signal corresponding to the white noise modulated by the low-band residual through sparse coding, the high-frequency band excitation signal 186 for strong unvoiced audio can correspond to the unmodulated white noise with less artifacts.

在特定实施例中，可经由处理单元(例如中央处理单元(CPU)、数字信号处理器(DSP)或控制器)的硬件(例如，现场可编程门阵列(FPGA)装置、专用集成电路(ASIC)等)、经由固件装置或其任何组合来实施图7的方法700。作为一实例，可由执行指令的处理器(如关于图9所描述)执行图7的方法700。In certain embodiments, the method 700 of FIG. 7 may be implemented via hardware (e.g., a field programmable gate array (FPGA) device, an application specific integrated circuit (ASIC), etc.) of a processing unit (e.g., a central processing unit (CPU), a digital signal processor (DSP), or a controller), via firmware devices, or any combination thereof. As an example, the method 700 of FIG. 7 may be performed by a processor executing instructions (as described with respect to FIG. 9 ).

参考图8，展示高频带激励信号生成的方法的特定实施例的流程图，且通常将其指定为800。可由图1至3的系统100至300的一或多个组件执行方法800。举例来说，可通过图1的高频带激励信号生成模块122的一或多个组件、图2或图3的激励信号产生器222、图2的浊音因数产生器208或其组合执行方法800。With reference to figure 8, show the process flow diagram of the particular embodiment of the method that high frequency band excitation signal generates, and it is designated as 800 usually.Can be by one or more component execution method 800 of the system 100 to 300 of Fig. 1 to 3.For example, can be by the voiced sound factor producer 208 of one or more components of the high frequency band excitation signal generation module 122 of Fig. 1, Fig. 2 or Fig. 3, Fig. 2 or its combination execution method 800.

方法800包含在802处在装置处确定输入信号的浊音分类。所述输入信号可对应于音频信号。举例来说，图1的浊音分类器160可确定输入信号130的浊音分类180，如参考图1所描述。输入信号130可对应于音频信号。Method 800 includes determining, at a device, a voiced speech classification of an input signal at 802. The input signal may correspond to an audio signal. For example, the voiced speech classifier 160 of FIG. 1 may determine the voiced speech classification 180 of the input signal 130, as described with reference to FIG. 1. The input signal 130 may correspond to an audio signal.

方法800也包含在804处基于浊音分类控制输入信号的表示的包络的量。举例来说，图1的包络调整器162可基于浊音分类180控制输入信号130的表示的包络的量，如参考图1所描述。输入信号130的表示可为位流(例如，图2的位流232)的低频带部分、低频带信号(例如，图3的低频带信号334)、通过扩展低频带激励信号(例如，图2的低频带激励信号244)生成的扩展信号、另一信号或其组合。举例来说，输入信号130的表示可包含图4至7的代表性信号422。Method 800 is also included in 804 places based on the amount of the envelope of the representation of the voiced sound classification control input signal.For example, the envelope adjuster 162 of Fig. 1 can be based on the amount of the envelope of the representation of voiced sound classification 180 control input signal 130, as described with reference to Figure 1.The representation of input signal 130 can be the low frequency band part of bit stream (for example, the bit stream 232 of Fig. 2), low frequency band signal (for example, the low frequency band signal 334 of Fig. 3), extended signal, another signal or its combination generated by extending low frequency band excitation signal (for example, the low frequency band excitation signal 244 of Fig. 2).For example, the representation of input signal 130 can comprise the representative signal 422 of Fig. 4 to 7.

方法800进一步包含在806处基于包络的受控量调制白噪声信号。举例来说，图1的调制器164可基于信号包络182调制白噪声156。信号包络182可对应于包络的受控量。为了说明，调制器164可调制时域中的白噪声156，例如图4及6至7中。替代地，调制器164可调制在变换域中表示的白噪声156，例如图4至7中。Method 800 further includes, at 806, modulating the white noise signal based on the controlled amount of the envelope. For example, modulator 164 of FIG. 1 may modulate white noise 156 based on signal envelope 182. Signal envelope 182 may correspond to the controlled amount of the envelope. For illustration, modulator 164 may modulate white noise 156 in the time domain, such as in FIG. 4 and 6-7. Alternatively, modulator 164 may modulate white noise 156 represented in a transform domain, such as in FIG. 4-7.

方法800也包含在808处基于经调制的白噪声信号生成高频带激励信号。举例来说，图1的输出电路166可基于经调制的白噪声184生成高频带激励信号186，如参考图1所描述。The method 800 also includes generating a high-band excitation signal based on the modulated white noise signal at 808. For example, the output circuit 166 of FIG. 1 may generate the high-band excitation signal 186 based on the modulated white noise 184, as described with reference to FIG.

因此，图8的方法800可使得能够基于输入信号的包络的受控量生成高频带激励信号，其中基于浊音分类控制包络的量。Thus, the method 800 of FIG. 8 may enable generation of a high-band excitation signal based on a controlled amount of an envelope of an input signal, wherein the amount of the envelope is controlled based on a voiced speech classification.

在特定实施例中，可经由处理单元(例如中央处理单元(CPU)、数字信号处理器(DSP)或控制器)的硬件(例如，现场可编程门阵列(FPGA)装置、专用集成电路(ASIC)等)、经由固件装置或其任何组合来实施图8的方法800。作为一实例，可由执行指令的处理器(如关于图9所描述)执行图8的方法800。In certain embodiments, the method 800 of FIG8 may be implemented via hardware (e.g., a field programmable gate array (FPGA) device, an application specific integrated circuit (ASIC), etc.) of a processing unit (e.g., a central processing unit (CPU), a digital signal processor (DSP), or a controller), via firmware devices, or any combination thereof. As an example, the method 800 of FIG8 may be performed by a processor executing instructions (as described with respect to FIG9 ).

尽管图1至8的实施例描述基于低频带信号生成高频带激励信号，但在其它实施例中，可对输入信号130进行滤波以产生多个频带信号。举例来说，多个频带信号可包含较低频带信号、中等频带信号、较高频带信号、一或多个额外频带信号，或其组合。中等频带信号可对应于比较低频带信号更高的频率范围，且较高频带信号可对应于比中等频带信号更高的频率范围。较低频带信号及中等频带信号可对应于重叠或非重叠频率范围。中等频带信号及较高频带信号可对应于重叠或非重叠频率范围。Although the embodiment description of Fig. 1 to 8 generates high-band excitation signal based on low-frequency band signal, in other embodiments, input signal 130 can be filtered to produce a plurality of frequency band signals.For example, a plurality of frequency band signals can comprise lower-frequency band signal, medium-band signal, higher-frequency band signal, one or more extra frequency band signals, or its combination.The medium-frequency band signal can correspond to the frequency range higher than the low-frequency band signal, and the higher-frequency band signal can correspond to the frequency range higher than the medium-frequency band signal. Lower-frequency band signal and medium-band signal can correspond to overlapping or non-overlapping frequency range.Medium-band signal and higher-frequency band signal can correspond to overlapping or non-overlapping frequency range.

激励信号生成模块122可使用第一频带信号(例如，较低频带信号或中等频带信号)来生成对应于第二频带信号(例如，中等频带信号或较高频带信号)的激励信号，其中第一频带信号对应于比第二频带信号更低的频率范围。The excitation signal generation module 122 can use a first frequency band signal (e.g., a lower frequency band signal or a middle frequency band signal) to generate an excitation signal corresponding to a second frequency band signal (e.g., a middle frequency band signal or an upper frequency band signal), wherein the first frequency band signal corresponds to a lower frequency range than the second frequency band signal.

在特定实施例中，激励信号生成模块122可使用第一频带信号来生成对应于多个频带信号的多个激励信号。举例来说，激励信号生成模块122可使用较低频带信号来生成对应于中等频带信号的中等频带激励信号、对应于较高频带信号的较高频带激励信号、一或多个额外频带激励信号，或其组合。In a particular embodiment, the excitation signal generation module 122 may use the first frequency band signal to generate a plurality of excitation signals corresponding to the plurality of frequency band signals. For example, the excitation signal generation module 122 may use the lower frequency band signal to generate a mid-band excitation signal corresponding to the mid-band signal, a higher frequency band excitation signal corresponding to the higher frequency band signal, one or more additional frequency band excitation signals, or a combination thereof.

参考图9，描绘装置(例如，无线通信装置)的特定说明性实施例的框图，且通常将其指定为900。在各种实施例中，装置900可具有比图9中所说明的更少或更多的组件。在说明性实施例中，装置900可对应于图1的移动装置104或第一装置102。在说明性实施例中，装置900可根据图4至8的方法400至800中的一或多者操作。9 , a block diagram of a particular illustrative embodiment of a device (e.g., a wireless communication device) is depicted and generally designated 900. In various embodiments, device 900 may have fewer or more components than illustrated in FIG 9 . In an illustrative embodiment, device 900 may correspond to mobile device 104 or first device 102 of FIG 1 . In an illustrative embodiment, device 900 may operate according to one or more of methods 400-800 of FIGs. 4-8 .

在特定实施例中，装置900包含处理器906(例如，中央处理单元(CPU))。装置900可包含一或多个额外处理器910(例如，一或多个数字信号处理器(DSP))。处理器910可包括话音及音乐译码解码器(编解码器)908及回音消除器912。话音及音乐编解码器908可包含图1的激励信号生成模块122、激励信号产生器222、图2的浊音因数产生器208、声码器编码器936、声码器解码器938，或声码器编码器936及声码器解码器938两者。在特定实施例中，声码器编码器936可包含图1的高频带编码器172、图3的低频带编码器304或所述两者。在特定实施例中，声码器解码器938可包含图1的高频带合成器168、图2的低频带合成器204或所述两者。In a particular embodiment, device 900 includes processor 906 (for example, central processing unit (CPU)). Device 900 may include one or more additional processors 910 (for example, one or more digital signal processors (DSP)). Processor 910 may include speech and music decoding decoder (codec) 908 and echo canceller 912. Speech and music codec 908 may include the voiced sound factor producer 208, vocoder encoder 936, vocoder decoder 938 of the excitation signal generation module 122, excitation signal generator 222, Fig. 2, or vocoder encoder 936 and vocoder decoder 938. In a particular embodiment, vocoder encoder 936 may include the high frequency band encoder 172 of Fig. 1, the low frequency band encoder 304 of Fig. 3 or both. In a particular embodiment, vocoder decoder 938 may include the low frequency band synthesizer 204 of high frequency band synthesizer 168, Fig. 2 or both.

如所说明，激励信号生成模块122、浊音因数产生器208及激励信号产生器222可为可由声码器编码器936及声码器解码器938接入的共享组件。在其它实施例中，激励信号生成模块122、浊音因数产生器208和/或激励信号产生器222中的一或多者可包含于声码器编码器936及声码器解码器938中。As illustrated, the excitation signal generation module 122, the voiced speech factor generator 208, and the excitation signal generator 222 may be shared components accessible by the vocoder encoder 936 and the vocoder decoder 938. In other embodiments, one or more of the excitation signal generation module 122, the voiced speech factor generator 208, and/or the excitation signal generator 222 may be included in the vocoder encoder 936 and the vocoder decoder 938.

尽管将话音及音乐编解码器908说明为处理器910的组件(例如，专用电路和/或可执行编程代码)，但在其它实施例中，话音及音乐编解码器908的一或多个组件(例如激励信号生成模块122)可包含于处理器906、编解码器934、另一处理组件或其组合中。Although the speech and music codec 908 is illustrated as a component of the processor 910 (e.g., dedicated circuitry and/or executable programming code), in other embodiments, one or more components of the speech and music codec 908 (e.g., the excitation signal generation module 122) may be included in the processor 906, the codec 934, another processing component, or a combination thereof.

装置900可包含存储器932及编解码器934。装置900可包含经由收发器950耦合至天线942的无线控制器940。装置900可包含耦合至显示控制器926的显示器928。扬声器948、麦克风946或所述两者可耦合至编解码器934。在特定实施例中，扬声器948可对应于图1的扬声器142。在特定实施例中，麦克风946可对应于图1的麦克风146。编解码器934可包含数/模转换器(DAC)902及模/数转换器(ADC)904。Device 900 may include memory 932 and codec 934. Device 900 may include a wireless controller 940 coupled to an antenna 942 via a transceiver 950. Device 900 may include a display 928 coupled to a display controller 926. A speaker 948, a microphone 946, or both may be coupled to codec 934. In a particular embodiment, speaker 948 may correspond to speaker 142 of FIG. 1. In a particular embodiment, microphone 946 may correspond to microphone 146 of FIG. 1. Codec 934 may include a digital-to-analog converter (DAC) 902 and an analog-to-digital converter (ADC) 904.

在特定实施例中，编解码器934可自麦克风946接收模拟信号，使用模/数转换器904将模拟信号转换成数字信号，及将数字信号提供至话音及音乐编解码器908(例如，以脉码调制(PCM)格式)。话音及音乐编解码器908可处理数字信号。在特定实施例中，话音及音乐编解码器908可将数字信号提供至编解码器934。编解码器934可使用数/模转换器902将数字信号转换成模拟信号且可将模拟信号提供至扬声器948。In a particular embodiment, the codec 934 may receive an analog signal from the microphone 946, convert the analog signal into a digital signal using the analog-to-digital converter 904, and provide the digital signal to the voice and music codec 908 (e.g., in pulse code modulation (PCM) format). The voice and music codec 908 may process the digital signal. In a particular embodiment, the voice and music codec 908 may provide the digital signal to the codec 934. The codec 934 may convert the digital signal into an analog signal using the digital-to-analog converter 902 and may provide the analog signal to the speaker 948.

存储器932可包含可由装置900的处理器906、处理器910、编解码器934、另一处理单元或其组合执行以执行本文中所揭示的方法及过程(例如，图4至8的方法400至800中的一或多者)的指令956。Memory 932 may include instructions 956 that may be executed by processor 906, processor 910, codec 934, another processing unit, or a combination thereof of device 900 to perform the methods and processes disclosed herein (e.g., one or more of methods 400-800 of Figures 4-8).

可经由专用硬件(例如，电路)通过执行指令以执行一或多个任务的处理器或其组合来实施系统100至300的一或多个组件。作为一实例，存储器932或处理器906、处理器910和/或编解码器934的一或多个组件可为存储器装置，例如随机存取存储器(RAM)、磁电阻随机存取存储器(MRAM)、自旋扭矩转移MRAM(STT-MRAM)、闪速存储器、只读存储器(ROM)、可编程只读存储器(PROM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)、寄存器、硬盘、可卸除式磁盘或光盘只读存储器(CD-ROM)。存储器装置可包含在由计算机(例如，编解码器934中的处理器、处理器906和/或处理器910)执行时可引起计算机执行图4至8的方法400至800中的一或多者的至少一部分的指令(例如，指令956)。作为一实例，存储器932或处理器906、处理器910、编解码器934的一或多个组件可为非暂时性计算机可读媒体，其包含在由计算机(例如，编解码器934中的处理器、处理器906和/或处理器910)执行时引起计算机执行图4至8的方法400至800中的一或多者的至少一部分的指令(例如，指令956)。One or more components of systems 100-300 may be implemented via dedicated hardware (e.g., circuitry) or a processor that executes instructions to perform one or more tasks, or a combination thereof. For example, memory 932 or one or more components of processor 906, processor 910, and/or codec 934 may be memory devices such as random access memory (RAM), magnetoresistive random access memory (MRAM), spin torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, a hard disk, a removable disk, or a compact disk read-only memory (CD-ROM). The memory device may include instructions (e.g., instructions 956) that, when executed by a computer (e.g., a processor in codec 934, processor 906, and/or processor 910), cause the computer to perform at least a portion of one or more of methods 400-800 of Figures 4-8. As an example, memory 932 or one or more components of processor 906, processor 910, codec 934 may be non-transitory computer-readable media that includes instructions (e.g., instructions 956) that, when executed by a computer (e.g., a processor in codec 934, processor 906, and/or processor 910), cause the computer to perform at least a portion of one or more of methods 400 to 800 of Figures 4 to 8.

在特定实施例中，装置900可包含于系统级封装或系统单芯片装置(例如，移动站调制解调器(MSM))922中。在特定实施例中，处理器906、处理器910、显示控制器926、存储器932、编解码器934、无线控制器940及收发器950包含于系统级封装或系统单芯片装置922中。在特定实施例中，输入装置930(例如触摸屏和/或小键盘)及电力供应器944耦合至系统单芯片装置922。此外，在特定实施例中，如图9中所说明，显示器928、输入装置930、扬声器948、麦克风946、天线942及电力供应器944在系统单芯片装置922外部。然而，显示器928、输入装置930、扬声器948、麦克风946、天线942及电力供应器944中的每一者可耦合至系统单芯片装置922的组件，例如接口或控制器。In a particular embodiment, device 900 may be included in a system-in-package or system-on-a-chip device (e.g., a mobile station modem (MSM)) 922. In a particular embodiment, processor 906, processor 910, display controller 926, memory 932, codec 934, wireless controller 940, and transceiver 950 are included in the system-in-package or system-on-a-chip device 922. In a particular embodiment, input device 930 (e.g., a touch screen and/or keypad) and power supply 944 are coupled to the system-on-a-chip device 922. Furthermore, in a particular embodiment, as illustrated in FIG9 , display 928, input device 930, speaker 948, microphone 946, antenna 942, and power supply 944 are external to the system-on-a-chip device 922. However, each of the display 928, input device 930, speaker 948, microphone 946, antenna 942, and power supply 944 may be coupled to a component of the system-on-a-chip device 922, such as an interface or controller.

装置900可包含移动通信装置、智能电话、蜂窝式电话、膝上型计算机、计算机、平板计算机、个人数字助理、显示装置、电视、游戏机、音乐播放器、收音机、数字视频播放器、数字影碟(DVD)播放器、调谐器、相机、导航装置、解码器系统、编码器系统或其任何组合。Device 900 may include a mobile communication device, a smart phone, a cellular phone, a laptop computer, a computer, a tablet computer, a personal digital assistant, a display device, a television, a game console, a music player, a radio, a digital video player, a digital video disc (DVD) player, a tuner, a camera, a navigation device, a decoder system, an encoder system, or any combination thereof.

在说明性实施例中，处理器910可为可操作的以执行参考图1至8所描述的方法或操作的全部或一部分。举例来说，麦克风946可捕获音频信号(例如，图1的输入信号130)。ADC 904可将所捕获音频信号自模拟波形转换成由数字音频样本组成的数字波形。处理器910可处理数字音频样本。增益调整器可调整数字音频样本。回音消除器912可减少可已由扬声器948的输出输入麦克风946所产生的回音。In an illustrative embodiment, the processor 910 may be operable to perform all or part of the methods or operations described with reference to Figures 1 to 8. For example, the microphone 946 may capture an audio signal (e.g., the input signal 130 of Figure 1). The ADC 904 may convert the captured audio signal from an analog waveform into a digital waveform consisting of digital audio samples. The processor 910 may process the digital audio samples. The gain adjuster may adjust the digital audio samples. The echo canceller 912 may reduce echo that may have been generated by the output of the speaker 948 input to the microphone 946.

声码器编码器936可压缩对应于经处理话音信号的数字音频样本且可形成发射包(例如，数字音频样本的经压缩位的表示)。举例来说，发射包可对应于图1的位流132的至少一部分。发射包可存储在存储器932中。收发器950可调制某一形式的发射包(例如，可将其它信息附于所述发射包)且可经由天线942发射经调制数据。The vocoder encoder 936 may compress the digital audio samples corresponding to the processed voice signal and may form a transmit packet (e.g., a compressed bit representation of the digital audio samples). For example, the transmit packet may correspond to at least a portion of the bitstream 132 of FIG. 1. The transmit packet may be stored in the memory 932. The transceiver 950 may modulate some form of the transmit packet (e.g., may append other information to the transmit packet) and may transmit the modulated data via the antenna 942.

作为另一实例，天线942可接收包含接收包的传入包。可由另一装置经由网络发送接收包。举例来说，接收包可对应于图1的位流132的至少一部分。声码器解码器938可解压缩接收包。经解压缩波形可被称作重新构建的音频样本。回音消除器912可去除来自经重新构建的音频样本的回音。As another example, antenna 942 may receive incoming packets including receive packets. The receive packets may be sent by another device over the network. For example, the receive packets may correspond to at least a portion of bitstream 132 of FIG. 1 . Vocoder decoder 938 may decompress the receive packets. The decompressed waveform may be referred to as a reconstructed audio sample. Echo canceller 912 may remove echo from the reconstructed audio sample.

执行话音及音乐编解码器908的处理器910可生成高频带激励信号186，如参考图1至8所描述。处理器910可基于高频带激励信号186生成图1的输出信号116。增益调整器可扩增或抑制输出信号116。DAC 902可将输出信号116自数字波形转换成模拟波形且可将经转换信号提供至扬声器948。The processor 910 that carries out speech and music codec 908 can generate high frequency band excitation signal 186, as described with reference to Figures 1 to 8.Processor 910 can generate the output signal 116 of Figure 1 based on high frequency band excitation signal 186.Gain adjuster can amplify or suppress output signal 116.DAC 902 can convert output signal 116 to analog waveform from digital waveform and can be provided to loudspeaker 948 through converted signal.

结合所描述的实施例，揭示一种包含用于确定输入信号的浊音分类的装置的设备。输入信号可对应于音频信号。举例来说，用于确定浊音分类的装置可包含图1的浊音分类器160、经配置以确定输入信号的浊音分类的一或多个装置(例如，执行在非暂时性计算机可读存储媒体处的指令的处理器)或其任何组合。In conjunction with the described embodiments, an apparatus is disclosed that includes a device for determining a voiced speech classification for an input signal. The input signal may correspond to an audio signal. For example, the device for determining a voiced speech classification may include the voiced speech classifier 160 of FIG. 1 , one or more devices configured to determine the voiced speech classification for the input signal (e.g., a processor executing instructions on a non-transitory computer-readable storage medium), or any combination thereof.

举例来说，浊音分类器160可确定参数242，所述参数包含输入信号130的低频带信号的零交叉率、第一反射系数、低频带激励中的适应性码簿贡献的能量与低频带激励中的适应性码簿及固定码簿贡献的总和的能量的比率、输入信号130的低频带信号的音调增益或其组合。在特定实施例中，浊音分类器160可基于图3的低频带信号334确定参数242。在替代实施例中，浊音分类器160可自图2的位流的低频带部分232提取参数242。For example, voiced sound classifier 160 can determine parameter 242, described parameter comprises the ratio of the energy of the zero crossing rate of the low-band signal of input signal 130, the first reflection coefficient, the adaptive codebook in the low-band excitation contribution and the energy of the summation that the adaptive codebook in the low-band excitation and fixed codebook contribute, the pitch gain of the low-band signal of input signal 130 or its combination.In a particular embodiment, voiced sound classifier 160 can determine parameter 242 based on the low-band signal 334 of Fig. 3.In an alternative embodiment, voiced sound classifier 160 can extract parameter 242 from the low-band part 232 of the bit stream of Fig. 2.

浊音分类器160可基于方程式确定浊音分类180(例如，浊音因数236)。举例来说，浊音分类器160可基于方程式1及参数242确定浊音分类180。为了说明，浊音分类器160可通过计算零交叉率、第一反射系数、能量比率、音调增益、先前浊音决策、恒定值或其组合的加权总和来确定浊音分类180，如参考图4所描述。The voiced speech classifier 160 may determine the voiced speech classification 180 (e.g., the voiced speech factor 236) based on an equation. For example, the voiced speech classifier 160 may determine the voiced speech classification 180 based on Equation 1 and the parameter 242. To illustrate, the voiced speech classifier 160 may determine the voiced speech classification 180 by calculating a weighted sum of a zero-crossing rate, a first reflection coefficient, an energy ratio, a pitch gain, a previous voiced speech decision, a constant value, or a combination thereof, as described with reference to FIG.

设备也包含用于基于浊音分类控制输入信号的表示的包络的量的装置。举例来说，用于控制包络的量的装置可包含图1的包络调整器162、经配置以基于浊音分类控制输入信号的表示的包络的量的一或多个装置(例如，执行在非暂时性计算机可读存储媒体处的指令的处理器)或其任何组合。The apparatus also includes means for controlling an amount of an envelope of a representation of the input signal based on the voicing classification. For example, the means for controlling the amount of the envelope may include envelope adjuster 162 of FIG. 1 , one or more devices configured to control the amount of the envelope of the representation of the input signal based on the voicing classification (e.g., a processor executing instructions on a non-transitory computer-readable storage medium), or any combination thereof.

举例来说，包络调整器162可通过将图1的浊音分类180(例如图2的浊音因数236)乘以截止频率缩放因数来生成频率浊音分类。截止频率缩放因数可为默认值。LPF截止频率426可对应于默认截止频率。包络调整器162可通过调整LPF截止频率426来控制信号包络182的量，如参考图4所描述。举例来说，包络调整器162可通过将频率浊音分类与LPF截止频率426相加来调整LPF截止频率426。For example, envelope adjuster 162 may generate a frequency voicing classification by multiplying voicing classification 180 of FIG. 1 (e.g., voicing factor 236 of FIG. 2 ) by a cutoff frequency scaling factor. The cutoff frequency scaling factor may be a default value. LPF cutoff frequency 426 may correspond to the default cutoff frequency. Envelope adjuster 162 may control the amount of signal envelope 182 by adjusting LPF cutoff frequency 426, as described with reference to FIG. 4 . For example, envelope adjuster 162 may adjust LPF cutoff frequency 426 by adding the frequency voicing classification to LPF cutoff frequency 426.

作为另一实例，包络调整器162可通过将图1的浊音分类180(例如，图2的浊音因数236)乘以带宽缩放因数来生成带宽扩张因数526。包络调整器162可确定与代表性信号422相关联的高频带LPC极点。包络调整器162可通过将带宽扩张因数526乘以极点缩放因数来确定极点调整因数。极点缩放因数可为默认值。包络调整器162可通过调整高频带LPC极点来控制信号包络182的量，如参考图5所描述。举例来说，包络调整器162可通过极点调整因数将高频带LPC极点调整至原始状态。As another example, the envelope adjuster 162 can generate a bandwidth expansion factor 526 by multiplying the voiced speech classification 180 of FIG. 1 (e.g., the voiced speech factor 236 of FIG. 2 ) by a bandwidth scaling factor. The envelope adjuster 162 can determine the high-band LPC pole associated with the representative signal 422. The envelope adjuster 162 can determine the pole adjustment factor by multiplying the bandwidth expansion factor 526 by the pole scaling factor. The pole scaling factor can be a default value. The envelope adjuster 162 can control the amount of the signal envelope 182 by adjusting the high-band LPC pole, as described with reference to FIG. 5 . For example, the envelope adjuster 162 can adjust the high-band LPC pole to its original state by the pole adjustment factor.

作为另一实例，包络调整器162可确定滤波器的系数。滤波器的系数可为默认值。包络调整器162可通过将带宽扩张因数526乘以滤波器缩放因数来确定滤波器调整因数。滤波器缩放因数可为默认值。包络调整器162可通过调整滤波器的系数来控制信号包络182的量，如参考图6所描述。举例来说，包络调整器162可将滤波器的系数中的每一者乘以滤波器调整因数。As another example, envelope adjuster 162 may determine the coefficients of the filter. The coefficients of the filter may be default values. Envelope adjuster 162 may determine the filter adjustment factor by multiplying bandwidth expansion factor 526 by a filter scaling factor. The filter scaling factor may be a default value. Envelope adjuster 162 may control the amount of signal envelope 182 by adjusting the coefficients of the filter, as described with reference to FIG6 . For example, envelope adjuster 162 may multiply each of the coefficients of the filter by the filter adjustment factor.

设备进一步包含用于基于包络的受控量调制白噪声信号的装置。举例来说，用于调制白噪声信号的装置可包含图1的调制器164、经配置以基于包络的受控量调制白噪声信号的一或多个装置(例如，执行在非暂时性计算机可读存储媒体处的指令的处理器)或其任何组合。举例来说，调制器164可确定白噪声156及信号包络182是否在同一域中。如果白噪声156在与信号包络182不同的域中，则调制器164可将白噪声156转换成在与信号包络182相同的域中或可将信号包络182转换成在与白噪声156相同的域中。调制器164可基于信号包络182调制白噪声156，如参考图4所描述。举例来说，调制器164可将在时域中的白噪声156及信号包络182相乘。作为另一实例，调制器164可卷积频域中的白噪声156及信号包络182。The apparatus further includes means for modulating the white noise signal based on a controlled amount of the envelope. For example, the means for modulating the white noise signal may include modulator 164 of FIG. 1 , one or more devices configured to modulate the white noise signal based on a controlled amount of the envelope (e.g., a processor executing instructions on a non-transitory computer-readable storage medium), or any combination thereof. For example, modulator 164 may determine whether white noise 156 and signal envelope 182 are in the same domain. If white noise 156 is in a different domain than signal envelope 182, modulator 164 may convert white noise 156 to be in the same domain as signal envelope 182 or may convert signal envelope 182 to be in the same domain as white noise 156. Modulator 164 may modulate white noise 156 based on signal envelope 182, as described with reference to FIG. 4 . For example, modulator 164 may multiply white noise 156 and signal envelope 182 in the time domain. As another example, modulator 164 may convolve white noise 156 and signal envelope 182 in the frequency domain.

设备也包含用于基于经调制的白噪声信号生成高频带激励信号的装置。举例来说，用于生成高频带激励信号的装置可包含图1的输出电路166、经配置以基于经调制的白噪声信号生成高频带激励信号的一或多个装置(例如，执行在非暂时性计算机可读存储媒体处的指令处理器)或其任何组合。Equipment also comprises the device that is used for generating high-frequency band excitation signal based on modulated white noise signal.For example, the device that is used to generate high-frequency band excitation signal can comprise the output circuit 166 of Fig. 1, be configured to generate one or more devices (for example, perform the instruction processor at the non-transitory computer-readable storage medium place) of high-frequency band excitation signal based on modulated white noise signal or its any combination.

在特定实施例中，输出电路166可基于经调制的白噪声184生成高频带激励信号186，如参考图4至7所描述。举例来说，输出电路166可将经调制白噪声184与噪声增益434相乘来生成经缩放的经调制白噪声438，如参考图4至6所描述。输出电路166可组合经缩放的经调制白噪声438及另一信号(例如，图4的经缩放的代表性信号440、图5的经缩放的经滤波信号540或图6的经缩放的合成高频带信号640)来生成高频带激励信号186。In a particular embodiment, output circuit 166 can generate high-frequency band excitation signal 186 based on modulated white noise 184, as described with reference to Figures 4 to 7. For example, output circuit 166 can multiply each other through modulated white noise 184 and noise gain 434 to generate the modulated white noise 438 through scaling, as described with reference to Figures 4 to 6. Output circuit 166 can combine the modulated white noise 438 through scaling and another signal (for example, the scaled representative signal 440 of Fig. 4, the scaled filtered signal 540 of Fig. 5 or the scaled synthetic high-frequency band signal 640 of Fig. 6) to generate high-frequency band excitation signal 186.

作为另一实例，输出电路166可将经调制的白噪声184与图7的经调制的噪声增益732相乘来生成经缩放的经调制白噪声740，如参考图7所描述。输出电路166可将经缩放的经调制白噪声740及经缩放的未经调制的白噪声742进行组合(例如，相加)来生成经缩放的白噪声744。输出电路166可组合经缩放的代表性信号440及经缩放的白噪声744来生成高频带激励信号186。7 multiplied by the modulated noise gain 732 of modulated white noise 184 and Fig. 7 to generate scaled modulated white noise 740, as described with reference to Figure 7. Output circuit 166 can combine (e.g., add) scaled modulated white noise 740 and scaled unmodulated white noise 742 to generate scaled white noise 744. Output circuit 166 can combine scaled representative signal 440 and scaled white noise 744 to generate high-band excitation signal 186.

所属领域的技术人员将进一步了解，结合本文所揭示的实施例所描述的各种说明性逻辑块、配置、模块、电路及算法步骤可实施为电子硬件、由处理装置(例如硬件处理器)执行的计算机软件或两者的组合。上文已通常在功能性方面描述各种说明性组件、块、配置、模块、电路及步骤。此功能性经实施为硬件或可执行软件取决于特定应用及强加于整个系统的设计约束而定。对于每一特定应用来说，所属领域的技术人员可以变化的方式实施所描述的功能性，但不应将所述实施决策解释为导致脱离本发明的范围。Those skilled in the art will further appreciate that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software executed by a processing device (e.g., a hardware processor), or a combination of both. Various illustrative components, blocks, configurations, modules, circuits, and steps have been generally described above in terms of functionality. Whether this functionality is implemented as hardware or executable software depends on the specific application and the design constraints imposed on the overall system. Those skilled in the art may implement the described functionality in varying ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

结合本文中所揭示的实施例而描述的方法或算法的步骤可直接体现于硬件中、由处理器执行的软件模块中，或两者的组合中。软件模块可驻存于存储器装置中，例如随机存取存储器(RAM)、磁电阻随机存取存储器(MRAM)、自旋扭矩转移MRAM(STT-MRAM)、闪速存储器、只读存储器(ROM)、可编程只读存储器(PROM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)、寄存器、硬盘、可卸除式磁盘或光盘只读存储器(CD-ROM)。示范性存储器装置耦合至处理器，使得处理器可从存储器装置读取信息且将信息写入至存储器装置。在替代方案中，存储器装置可与处理器成一体式。处理器及存储媒体可驻存于专用集成电路(ASIC)中。ASIC可驻存于计算装置或用户终端中。在替代方案中，处理器及存储媒体可作为离散组件驻存于计算装置或用户终端中。The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software module may reside in a memory device, such as random access memory (RAM), magnetoresistive random access memory (MRAM), spin torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, a hard disk, a removable magnetic disk, or a compact disk read-only memory (CD-ROM). Exemplary memory devices are coupled to the processor so that the processor can read information from and write information to the memory device. Alternatively, the memory device may be integral to the processor. The processor and storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. Alternatively, the processor and storage medium may reside as discrete components in the computing device or user terminal.

提供所揭示的实施例的先前描述以使所属领域的技术人员能够制作或使用所揭示的实施例。对于所属领域的技术人员来说，这些实施例的各种修改将易于显而易见，且本文所定义的原理可在不脱离本发明的范围的情况下应用于其它实施例。因此，本发明并非希望限于本文中所展示的实施例，而应符合可能与如以下权利要求书所定义的原理及新颖特征相一致的最广泛范围。The previous description of the disclosed embodiments is provided to enable those skilled in the art to make or use the disclosed embodiments. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other embodiments without departing from the scope of the invention. Therefore, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined in the following claims.

Claims

1. A method for generating a high-frequency bandgap excitation signal, comprising:

Voiced classification parameters of the input signal are extracted based on the received bitstream, wherein the input signal corresponds to the audio signal;

The frequency range of the envelope of the representation of the input signal is controlled based on the voiced sound classification parameters, and the frequency range is controlled based on the cutoff frequency of the low-pass filter applied to the representation of the input signal.

A white noise signal is modulated based on the controlled frequency range of the envelope; and

The high-frequency excitation signal, corresponding to a decoded version of the audio signal, is generated based on the modulated white noise signal.

2. The method of claim 1, further comprising controlling the magnitude of the envelope.

3. The method of claim 1, further comprising controlling at least one of the shape of the envelope and the gain of the envelope.

4. The method of claim 3, wherein the degree of change in the shape of the envelope is greater when the voiced classification parameter corresponds to a strongly voiced sound compared to when the voiced classification parameter corresponds to a strongly voiced sound.

5. The method according to claim 1, wherein the voiced sound classification parameter indicates that the input signal is a strongly voiced signal, a weakly voiced signal, a weakly unvoiced signal, or a strongly unvoiced signal.

6. The method of claim 1, further comprising determining the cutoff frequency based on the voiced sound classification parameters.

7. The method of claim 1, wherein the cutoff frequency is greater when the voiced classification parameter corresponds to a strongly voiced sound compared to when the voiced sound classification parameter corresponds to a strongly voiced sound.

8. The method of claim 1, wherein the extraction of the voiced classification parameters is performed by a decoder.

9. The method of claim 1, wherein the frequency range of the envelope of the representation of the input signal is controlled by a mobile communication device based on the voiced classification parameters.

10. The method of claim 1, wherein the frequency range of the envelope represented by the input signal is controlled based on the voiced classification parameters via a fixed-position communication unit.

11. The method of claim 1, wherein controlling the frequency range of the envelope of the representation comprises adjusting the representation of the input signal in the transform domain.

12. The method of claim 1, wherein the representation of the input signal comprises a low-frequency band excitation signal of an encoded version of the audio signal or a high-frequency band excitation signal of an encoded version of the audio signal.

13. The method of claim 1, wherein the representation of the input signal comprises a harmonic extended excitation signal, and wherein the harmonic extended excitation signal is generated from a low-frequency band excitation signal of an encoded version of the audio signal.

14. The method of claim 1, further comprising:

A scaled white noise signal is generated by combining a scaled unmodulated white noise signal with a scaled modulated white noise signal, wherein the high-frequency band excitation signal is based on the scaled white noise signal.

15. The method of claim 1, wherein the envelope comprises a time-varying envelope, and the method further comprises updating the envelope more than once per frame of the input signal.

16. An apparatus for generating a high-frequency bandgap excitation signal, comprising:

A voiced classifier configured to extract voiced classification parameters of an input signal based on a received bitstream, wherein the input signal corresponds to an audio signal;

An envelope adjuster configured to control the frequency range of the envelope of the representation of the input signal based on the voiced classification parameters, and to control the frequency range based on the cutoff frequency of a low-pass filter applied to the representation of the input signal;

A modulator configured to modulate a white noise signal within a controlled frequency range based on the envelope; and

The output circuit is configured to generate the high-frequency band excitation signal based on the modulated white noise signal.

17. The device of claim 16, wherein the envelope adjuster is configured to control at least one of the shape of the envelope, the magnitude of the envelope, and the gain of the envelope based on the voiced classification parameters.

18. The device of claim 17, wherein at least one of the shape of the envelope, the magnitude of the envelope, and the gain of the envelope is controlled by adjusting one or more poles of the linear predictive decoding (LPC) coefficients based on the voiced sound classification parameters.

19. The apparatus of claim 17, wherein at least one of the shape of the envelope, the magnitude of the envelope, and the gain of the envelope is configured to be controlled based on an adjusted coefficient of a filter, the adjusted coefficient being determined based on the voiced classification parameter, and wherein the modulator is configured to apply the filter to the white noise signal to generate the modulated white noise signal.

20. The apparatus of claim 16, further comprising:

Antenna; and

A receiver coupled to the antenna, the receiver being configured to receive the bit stream.

21. The device of claim 20, wherein the receiver, the voiced classifier, the envelope adjuster, the modulator, and the output circuit are integrated into a mobile communication device.

22. The device of claim 20, wherein the receiver, the voiced classifier, the envelope adjuster, the modulator, and the output circuit are integrated into a fixed-position communication unit.

23. The device according to claim 16, further comprising:

A high-frequency band encoder configured to encode the high-frequency band portion of the audio signal based on the high-frequency band excitation signal; and

A transmitter configured to transmit an encoded audio signal to another device, wherein the encoded audio signal is an encoded version of the audio signal.

24. A computer-readable storage device storing instructions that, when executed by at least one processor, cause said at least one processor to perform the following operations:

A high-frequency excitation signal is generated based on the modulated white noise signal.

25. The computer-readable storage device of claim 24, wherein the instructions are further executable to cause the at least one processor to control the shape of the envelope based on the voiced classification parameters.

26. The computer-readable storage device of claim 24, wherein the instructions are further executable to cause the at least one processor to control at least one of the magnitude of the envelope and the gain of the envelope.

27. An apparatus for generating a high-frequency bandgap excitation signal, comprising:

Apparatus for extracting voiced classification parameters of an input signal based on a received bitstream, wherein the input signal corresponds to an audio signal;

A means for controlling the frequency range of the envelope of a representation of an input signal based on the voiced classification parameters, wherein the frequency range is controlled based on the cutoff frequency of a low-pass filter applied to the representation of the input signal;

A means for modulating a white noise signal based on a controlled frequency range of the envelope; and

A means for generating the high-frequency band excitation signal based on a modulated white noise signal.

28. The apparatus of claim 27, wherein the representation of the input signal includes a low-frequency band excitation signal of the input signal, a high-frequency band excitation signal of the input signal, or a harmonic extended excitation signal, and wherein the harmonic extended excitation signal is generated from the low-frequency band excitation signal of the input signal.

29. The device of claim 27, wherein the means for extraction, the means for control, the means for modulation, and the means for generation are integrated in a mobile communication device.

30. The device of claim 27, wherein the means for extraction, the means for control, the means for modulation, and the means for generation are integrated in a fixed-position communication unit.