CN103124347A

CN103124347A - Method for guiding multi-view video coding quantization process by visual perception characteristics

Info

Publication number: CN103124347A
Application number: CN2012104020039A
Authority: CN
Inventors: 王永芳; 商习武; 刘静; 宋允东; 张兆杨
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2012-10-22
Filing date: 2012-10-22
Publication date: 2013-05-29
Anticipated expiration: 2032-10-22
Also published as: CN103124347B

Abstract

The present invention relates to a method for guiding the coding and quantization process by using visual perception characteristics. The operation steps of this method are as follows: (1) read the luminance value of each frame of the input video sequence, and set up a just discernible distortion threshold model in the frequency domain, (2) each frame of the input video sequence passes through the intra-viewpoint and the inter-viewpoint Prediction, (3) perform discrete cosine transform on the residual data, (4) dynamically adjust the quantization step size of each macroblock in the current frame, (5) dynamically adjust the Lagrangian parameters in the rate-distortion optimization process, (6 ) performs entropy encoding on the quantized data to form a code stream and transmit it through the network. The present invention improves the video compression efficiency under the condition that the subjective quality remains basically unchanged, and is more suitable for transmission in the network.

Description

A method to guide the quantization process of multi-view video coding by using the characteristics of visual perception

技术领域 technical field

本发明涉及多视点视频编解码技术领域，特别是利用视觉感知特性指导多视点视频编码量化过程的方法，适用于高清3D视频信号的编解码。 The invention relates to the technical field of multi-viewpoint video coding and decoding, in particular to a method for guiding the multi-viewpoint video coding and quantization process by using visual perception characteristics, which is suitable for high-definition 3D video signal coding and decoding.

背景技术 Background technique

随着时代发展，人们对视听感受的要求越来越高，不满足于现有的单视二维视频。人们对于立体感体验要求越来越高，从固定角度的立体感到任意角度都可以感受到立体感,从而催生出多视点编码技术的发展。然而，多视点要求的数据大大提高，如何有效的提高视频压缩效率成为研究热点。目前，视频压缩技术主要集中在去除空间冗余、时间冗余和统计冗余三个方面。尽管视频专家推出新一代视频压缩编码技术（HEVC），期待视频压缩效率在H.264基础上再提高一倍。然而，由于人类视觉系统（HVS）自身的特性，存在着感知冗余还是没有被去除。随着对人眼视觉特性研究的渐渐深入，有视频工作者提出了去除人眼冗余的恰可辨失真模型（Just Noticeable Distortion，JND）。即根据得到的JND阈值度量感知冗余的大小，当变化值低于这个阈值就不被人眼感知。 With the development of the times, people have higher and higher requirements for audio-visual experience, and are not satisfied with the existing single-view two-dimensional video. People have higher and higher requirements for stereoscopic experience, and stereoscopic perception from a fixed angle can be experienced from any angle, thus giving birth to the development of multi-viewpoint coding technology. However, the data required by multi-viewpoints is greatly increased, and how to effectively improve the video compression efficiency has become a research hotspot. At present, video compression technology mainly focuses on three aspects of removing spatial redundancy, temporal redundancy and statistical redundancy. Although video experts have launched a new generation of video compression coding technology (HEVC), it is expected that the video compression efficiency will be doubled on the basis of H.264. However, due to the characteristics of the human visual system (HVS), there is perceptual redundancy that has not been removed. With the gradual deepening of the research on the visual characteristics of the human eye, some video workers have proposed a Just Noticeable Distortion (JND) model that removes the redundancy of the human eye. That is, measure the size of perceptual redundancy according to the obtained JND threshold, and when the change value is lower than this threshold, it will not be perceived by human eyes.

目前对于JND的研究主要分为两大类：像素域JND和频域JND模型。其中，文献[1]中提出的JND模型是经典的像素域模型，分别研究了亮度掩盖特性、纹理掩盖特性和时域掩盖特性。文献[2]中提出的频域JND模型在研究了前三种特性外，还研究了人眼对不同频率段的敏感性，这样使得频域JND模型更加符合人眼的视觉特性。 At present, the research on JND is mainly divided into two categories: pixel-domain JND and frequency-domain JND models. Among them, the JND model proposed in the literature [1] is a classic pixel domain model, which studies the characteristics of brightness masking, texture masking and time domain masking respectively. The frequency-domain JND model proposed in [2] not only studies the first three characteristics, but also studies the sensitivity of the human eye to different frequency bands, which makes the frequency-domain JND model more in line with the visual characteristics of the human eye.

针对文献[2]中提出的JND模型，是目前比较完备的DCT域JND模型。它除了包含像素的亮度掩盖特性和纹理掩盖特性，还增加了空间灵敏度函数效应。空间灵敏度函数反映了人眼的带通特性，通过去除人眼不能感知的频率成分达到去除人眼感知频率冗余目的。在时域掩盖效应中，包含了平滑眼球移动效应，不仅包含了运动幅度的大小，还包含了运动的方向信息。有研究者将其与多视点视频相结合作用于残差DCT变换（离散余弦变换）后，极大提高了压缩效率。但是，没有将其用于其他的编码过程如量化过程，故其去除视觉冗余性不够彻底。 Aiming at the JND model proposed in the literature [2], it is a relatively complete DCT domain JND model at present. In addition to including the brightness masking feature and texture masking feature of the pixel, it also adds the effect of the spatial sensitivity function. The spatial sensitivity function reflects the bandpass characteristics of the human eye, and removes the frequency components that the human eye cannot perceive to achieve the purpose of removing the frequency redundancy of the human eye. In the time-domain masking effect, the smooth eyeball movement effect is included, which includes not only the magnitude of the motion amplitude, but also the direction information of the motion. Some researchers combined it with multi-view video and applied it to the residual DCT transform (discrete cosine transform), which greatly improved the compression efficiency. However, it is not used in other coding processes such as quantization, so it is not thorough enough to remove visual redundancy.

文献[3]中建立的JND模型，虽然提出了利用JND模型指导量化过程。然而其建立的JND模型是像素域的，缺少了去除人眼频率冗余的过程，导致指导量化过程不够精确。其次，针对JND模型保证了主观质量，只需要对人眼不敏感的地方进行调节量化值，而其它区域量化值保持不变。最后在调整量化参数同时，对应的调整拉格朗日参数。 The JND model established in the literature [3], although proposed to use the JND model to guide the quantization process. However, the JND model established by it is in the pixel domain, and lacks the process of removing the frequency redundancy of the human eye, resulting in inaccurate guidance and quantization. Secondly, the subjective quality is guaranteed for the JND model, and the quantization value only needs to be adjusted in places where the human eye is not sensitive, while the quantization value of other areas remains unchanged. Finally, while adjusting the quantization parameters, the Lagrangian parameters are correspondingly adjusted.

本发明专利申请首次提出将DCT域JND模型应用到多视点视频编码中量化过程，在保证主观质量不变的情况下，进一步提高视频压缩效率。 The patent application of the present invention proposes for the first time that the DCT domain JND model is applied to the quantization process of multi-viewpoint video coding, so as to further improve the video compression efficiency while keeping the subjective quality unchanged.

文献[1]: X. Yang, W. Lin, and Z. Lu, “Motion-compensated residue preprocessing in video coding based on just-noticeable-distortion profile,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 6, pp. 742–752,2005. Literature [1]: X. Yang, W. Lin, and Z. Lu, “Motion-compensated residue preprocessing in video coding based on just-noticeable-distortion profile,” IEEE Trans. Circuits Syst. Video Technol., vol. 15 , no. 6, pp. 742–752, 2005.

文献[2]: Zhenyu Wei and King N. Ngan., "Spatio-Temporal Just Noticeable Distortion Profile for Grey Scale Image/Video in DCT Domain." IEEE transactions on circuits and systems for video technology.VOL. 19, NO. 3, March 2009. Literature [2]: Zhenyu Wei and King N. Ngan., "Spatio-Temporal Just Noticeable Distortion Profile for Gray Scale Image/Video in DCT Domain." IEEE transactions on circuits and systems for video technology. VOL. 19, NO. 3 , March 2009.

文献[3]: Z. Chen and C. Guillemot, “Perceptually friendly H.26 /AVC video coding based on foveated just noticeable distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 6, pp. 806–819, Jun.2010. Literature [3]: Z. Chen and C. Guillemot, "Perceptually friendly H.26 /AVC video coding based on foveated just noticeable distortion model," IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 6, pp. 806–819, Jun. 2010.

发明内容 Contents of the invention

本发明的目的是针对已有技术存在的缺陷，提供一种利用视觉感知特性指导多视点视频编码量化过程的方法，该方法在保证视频主观质量不变的情况下，运用频域JND模型指导多视点量化过程，对人眼不敏感的区域提高量化步长，提高了视频压缩效率。在调整步长的同时，动态调整率失真优化函数的拉格朗日参数，使得编码效率进一步提高。 The purpose of the present invention is to provide a method for guiding the quantization process of multi-viewpoint video coding by using visual perception characteristics to solve the defects in the prior art. The method uses the frequency domain JND model to guide multi-viewpoint video coding while ensuring that the subjective quality of the video remains unchanged. During the viewpoint quantization process, the quantization step size is increased in areas that are not sensitive to human eyes, and the video compression efficiency is improved. While adjusting the step size, the Lagrangian parameter of the rate-distortion optimization function is dynamically adjusted, so that the coding efficiency is further improved.

为达到上述目的，本发明采用如下的技术方案： To achieve the above object, the present invention adopts the following technical solutions:

一种利用视觉感知特性指导多视点视频编码量化过程的方法，其特征在于操作步骤如下： A method of using visual perception characteristics to guide the quantization process of multi-viewpoint video coding, characterized in that the operation steps are as follows:

(1) 读取输入视频序列每一帧的亮度值大小，建立频域的恰可辨失真阈值模型， (1) Read the brightness value of each frame of the input video sequence, and establish a just discernable distortion threshold model in the frequency domain,

(2) 输入视频序列每一帧经过视点内和视点间的预测， (2) Each frame of the input video sequence undergoes intra-viewpoint and inter-viewpoint prediction,

(3) 对残差数据进行离散余弦变换（DCT变换）， (3) Discrete cosine transform (DCT transform) is performed on the residual data,

(4) 动态调节当前帧中每个宏块的量化步长， (4) Dynamically adjust the quantization step size of each macroblock in the current frame,

(5) 动态调节率失真优化过程中的拉格朗日参数， (5) Lagrangian parameters in the dynamic adjustment rate-distortion optimization process,

(6) 对量化的数据进行熵编码，形成码流通过网络传输。 (6) Entropy encoding is performed on the quantized data to form a code stream for transmission through the network.

本发明的利用视觉感知特性指导多视点视频编码量化过程的方法与已有技术相比较，具有如下显而易见的突出实质性特点和显著技术进步： Compared with the prior art, the method of using visual perception characteristics to guide the quantization process of multi-viewpoint video encoding in the present invention has the following obvious outstanding substantive features and significant technological progress:

1）、本多视点视频编码方法在保证重建视频质量不变的同时，使得编码过程在通过量化这个子程序就能降低编码码率，试验中最大码率可以降到12.35%； 1) The multi-view video coding method can reduce the coding bit rate through the subroutine quantization in the coding process while ensuring the quality of the reconstructed video. In the experiment, the maximum bit rate can be reduced to 12.35%;

2）、本多视点视频编码方法在保证重建视频质量不变的同时，采用平均主观分数差值，当主观分数差值接近0时，说明两种方法的主观质量越接近，本方法的平均主观分数差值为0.03，因此说本发明的主观质量与多视点视频编解码JMVC代码的主观质量相当； 2) This multi-viewpoint video coding method uses the average subjective score difference while ensuring the quality of the reconstructed video. When the subjective score difference is close to 0, it means that the subjective quality of the two methods is closer, and the average subjective score of this method is closer to 0. The score difference is 0.03, so the subjective quality of the present invention is equivalent to the subjective quality of multi-viewpoint video codec JMVC code;

3）、本多视点视频编码方法没有增加特别复杂的编码过程，以较小的复杂度提高视频编码压缩效率。 3) The multi-viewpoint video encoding method does not add a particularly complicated encoding process, and improves video encoding and compression efficiency with less complexity.

附图说明 Description of drawings

图1是本发明中的利用视觉感知特性指导多视点视频编码量化过程的方法的原理框图。 FIG. 1 is a functional block diagram of the method for guiding the quantization process of multi-viewpoint video coding by using visual perception characteristics in the present invention.

图2是频域的恰可辨失真模型的框图。 Figure 2 is a block diagram of a just discernible distortion model in the frequency domain.

图3是视点内/间预测的框图。 3 is a block diagram of intra/inter-view prediction.

图4是DCT变换框图。 Fig. 4 is a block diagram of DCT transformation.

图5是动态调节量化步长的框图。 Fig. 5 is a block diagram of dynamically adjusting the quantization step size.

图6是动态调节率失真代价函数中的拉格朗日参数的框图。 6 is a block diagram of dynamically adjusting Lagrange parameters in a rate-distortion cost function.

图7是熵编码输出的框图。 Figure 7 is a block diagram of an entropy encoded output.

图8a是视频序列ballroom第0个视点第15帧图像使用JMVC原始编码方法的重建图像。 Fig. 8a is the reconstructed image of the 15th frame image of the 0th viewpoint of the video sequence ballroom using the JMVC original coding method.

图8b是视频序列ballroom第0个视点第15帧图像使用本发明方法的重建图像。 Fig. 8b is the reconstructed image of the 15th frame image of the 0th viewpoint of the video sequence ballroom using the method of the present invention.

图9是视频序列ballroom使用JMVC原始编码方法和本发明方法在不同QP和不同视点情况下，码率、PSNR值、重建视频主观质量评价分数差（DM0S）的对比结果 Figure 9 is the comparison result of the video sequence ballroom using the JMVC original encoding method and the method of the present invention under different QP and different viewpoints, the bit rate, PSNR value, and the difference in subjective quality evaluation score (DMOS) of the reconstructed video

图10a是视频序列race1第1个视点第35帧图像使用JMVC原始编码方法的重建图像。 Fig. 10a is the reconstructed image of the 35th frame image of the first viewpoint of the video sequence race1 using the JMVC original coding method.

图10b是视频序列race1第1个视点第35帧图像使用本发明方法的重建图像 Figure 10b is the reconstructed image of the 35th frame image of the first viewpoint of the video sequence race1 using the method of the present invention

图11是视频序列race1使用JMVC原始编码方法和本发明方法在不同QP和不同视点情况下，码率、PSNR值、重建视频主观质量评价分数差（DM0S）的对比结果 Fig. 11 is the comparison result of video sequence race1 using JMVC original coding method and the method of the present invention under different QP and different viewpoints, bit rate, PSNR value, reconstructed video subjective quality evaluation score difference (DMOS)

图12a是视频序列Crowd第2个视点第45帧图像使用JMVC原始编码方法的重建图像。 Fig. 12a is the reconstructed image of the 45th frame image of the second viewpoint of the video sequence Crowd using the JMVC original encoding method.

图12b是视频序列Crowd第2个视点第45帧图像使用本发明方法的重建图像。 Fig. 12b is the reconstructed image of the 45th frame image of the second viewpoint of the video sequence Crowd using the method of the present invention.

图13是视频序列Crowd使用JMVC原始编码方法和本发明方法在不同QP和不同视点情况下，码率、PSNR值、重建视频平均主观评分差值（DM0S）的对比结果。 Figure 13 shows the comparison results of video sequence Crowd using the JMVC original coding method and the method of the present invention under different QP and different viewpoints, bit rate, PSNR value, and reconstructed video average subjective score difference (DMOS).

具体实施方式 Detailed ways

以下结合附图对本发明的优选实施例作进一步的详细说明： Below in conjunction with accompanying drawing, preferred embodiment of the present invention is described in further detail:

实施例一： Embodiment one:

本实施例利用视觉感知特性指导多视点视频编码量化过程的方法，参见图1，包括以下步骤： In this embodiment, the method of using visual perception characteristics to guide the quantization process of multi-viewpoint video coding, as shown in FIG. 1, includes the following steps:

(3) 对残差数据进行离散余弦变换， (3) Discrete cosine transform is performed on the residual data,

实施例二：本实施例与实施例一基本相同，特别之处如下： Embodiment 2: This embodiment is basically the same as Embodiment 1, and the special features are as follows:

上述步骤(1)中建立频域JND模型包括四个模型，参见图2： The establishment of the frequency domain JND model in the above step (1) includes four models, see Figure 2:

（1-1）空间对比灵敏度函数模型是根据人眼的带通特性曲线，对于特定空间频率

其基本的JND阈值可表示为： (1-1) The spatial contrast sensitivity function model is based on the bandpass characteristic curve of the human eye, for a specific spatial frequency

Its basic JND threshold can be expressed as:

空间频率

的计算公式为： spatial frequency

The calculation formula is:

其中，

和表示DCT变换块的坐标位置，

为DCT变换块的维数，

和

表示水平和垂直的视角，一般认为水平视角等于垂直视角，其表达为： in,

and Indicates the coordinate position of the DCT transform block,

is the dimension of the DCT transform block,

and

Indicates the horizontal and vertical viewing angles. It is generally believed that the horizontal viewing angle is equal to the vertical viewing angle, which is expressed as:

由于人眼视觉敏感度具有方向性，对水平和垂直方向比较敏感，对其他方向的敏感度相对小些。由此加上方向的调制因子可得： Since the visual sensitivity of the human eye is directional, it is more sensitive to horizontal and vertical directions, and less sensitive to other directions. Add the modulation factor of the direction to get:

为DCT系数向量所代表的频率的角度，

为DCT系数归一化因子表达式为：

is the angle of the frequency represented by the DCT coefficient vector,

The normalization factor expression for the DCT coefficient is:

最后加上控制参数

形成最终的空间灵敏度函数的调制因子为： Finally add the control parameters

The modulation factor forming the final spatial sensitivity function is:

在多视点编码过程中，由于存在8×8和4×4大小的DCT变换，故参数有所区别。在实验中，对于8×8块尺寸的DCT编码格式，

为0.6，

为1.33，

为0.11，为0.18；对于4×4块尺寸的DCT编码格式，

为0.6，

为0.8，

为0.035，

为0.008。 In the process of multi-view coding, the parameters are different due to the existence of 8×8 and 4×4 DCT transforms. In the experiment, for the DCT coding format with 8×8 block size,

is 0.6,

is 1.33,

is 0.11, is 0.18; for the DCT coding format of 4×4 block size,

is 0.6,

is 0.8,

is 0.035,

is 0.008.

（1-2）亮度掩盖效应模型是根据实验，人眼视觉感知敏感度在中间灰度值区域比在较黑和较亮的背景区域更加敏感，最后拟合出亮度掩盖效应曲线，其表达式为： (1-2) The brightness masking effect model is based on experiments. The human visual perception sensitivity is more sensitive in the middle gray value area than in the darker and brighter background areas. Finally, the brightness masking effect curve is fitted, and its expression for:

其中

是当前编码块的平均亮度值。 in

is the average luminance value of the current coded block.

（1-3）纹理掩盖效应模型是根据图像纹理性的不同，可将图像分为三个区域：边界区，平滑区和纹理区。人眼依次对其敏感度降低。通常利用canny算子分出图像的各个区域。 (1-3) The texture masking effect model divides the image into three areas according to the texture of the image: boundary area, smooth area and texture area. The human eye becomes less sensitive to it in turn. The canny operator is usually used to separate the various regions of the image.

利用canny算子求出的边缘像素密度如下： The edge pixel density obtained by using the canny operator is as follows:

其中，

是块的边缘像素总数，由Canny边缘检测器获得。 in,

is the total number of edge pixels of the block, obtained by the Canny edge detector.

利用边缘像素密度将图像块划分为平坦区，纹理区和边缘区，图像块分类的依据公式如下： Use edge pixel density The image block is divided into flat area, texture area and edge area, and the basis formula for image block classification is as follows:

对于纹理区域，眼睛对低频部分失真不敏感，但高频部分适当进行保留。故得到对比掩盖的估计因子为： For textured areas, the eye is not sensitive to distortion in the low frequency part, but the high frequency part is properly preserved. Therefore, the estimated factor for contrast masking is:

其中（

）是DCT系数标号。 in(

) is the DCT coefficient label.

由于空间对比灵敏度函数效应和亮度效应的重叠效应，得到最终掩盖效应因子为： Due to the overlapping effect of the spatial contrast sensitivity function effect and the brightness effect, the final masking effect factor is obtained as:

其中，

表示输入视频序列的第

帧，

为DCT系数，

为空间对比度灵敏度函数的阈值，

为亮度掩盖效应特性调制因子。 in,

Indicates the first step of the input video sequence

frame,

is the DCT coefficient,

is the threshold of the spatial contrast sensitivity function,

Modulation factor for the brightness masking effect characteristic.

（1-4）时间对比灵敏度函数模型是根据实验测得时域掩盖效应的调制因子为： (1-4) The time-contrast sensitivity function model is based on the experimentally measured modulation factor of the time-domain masking effect:

其中，表示时间频率，

表示空间频率。时间频率

其一般计算公式如下： in, represents the time frequency,

represents the spatial frequency. time frequency

Its general calculation formula is as follows:

分别为空间频率的水平和垂直分量，

为视网膜上物体运动的速度。

are the horizontal and vertical components of the spatial frequency, respectively,

is the velocity of the object moving on the retina.

的计算式为：

The calculation formula is:

其中，

和

表示像素水平和垂直的视角，

为DCT变换维数，

和

表示DCT变换块的坐标位置。 in,

and

Indicates the horizontal and vertical viewing angle of the pixel,

For the DCT transformation dimension,

and

Indicates the coordinate position of the DCT transform block.

视网膜上图像的速度

计算方法如下： speed of images on retina

The calculation method is as follows:

其中，

是平滑跟踪眼球移动效应增益，实验中取0.98。

表示物体在图像平面的速度，

表示由于漂移运动引起的最小的眼球移动速度，其经验值为0.15.deg/s。

是和眼睛跳跃运动相对应的眼球的最大速度，通常取80deg/s，

是视频序列的帧率。

是每个块的运动矢量，是像素的视角。 in,

is the smooth tracking eye movement effect gain, which is 0.98 in the experiment.

Indicates the velocity of the object in the image plane,

Indicates the minimum eye movement speed caused by drifting motion, and its empirical value is 0.15.deg/s.

is the maximum speed of the eyeball corresponding to the eye jumping movement, usually 80deg/s,

is the frame rate of the video sequence.

is the motion vector of each block, is the viewing angle of the pixel.

（1-5）四种因子的加权乘积即构成当前编码帧的恰可辨失真阈值，其表达式为： (1-5) The weighted product of the four factors constitutes the just discernible distortion threshold of the current coded frame, and its expression is:

其中，

为空间对比度灵敏度函数的阈值，

为亮度掩盖效应调制因子，

为掩盖效应调制因子，

为时域掩盖调制因子。 in,

is the threshold of the spatial contrast sensitivity function,

is the brightness masking effect modulation factor,

is the masking effect modulation factor,

Modulation factor for the time-domain mask.

上述步骤(2)是对输入视频序列进行视点间/内预测，参见图3，其具体步骤如下： Above-mentioned step (2) is to carry out inter/intra-viewpoint prediction to input video sequence, referring to Fig. 3, its specific steps are as follows:

（2-1）视点内帧间/内预测是通过视点内的帧间预测去除当前帧的时间冗余，通过视点内的帧内预测去除当前帧的空间冗余。在帧内预测和帧间预测中选择率失真优化函数最小的那种预测方式。其中率失真优化函数表达式为： (2-1) Intra-view inter/intra prediction is to remove the temporal redundancy of the current frame through intra-view inter prediction, and remove the spatial redundancy of the current frame through intra-view intra prediction. The prediction method with the smallest rate-distortion optimization function is selected in intra-frame prediction and inter-frame prediction. The expression of the rate-distortion optimization function is:

其中

为失真信号，

为不同编码模式下编码的比特数，

是调整后的拉格朗日参数。 in

is a distorted signal,

is the number of bits encoded in different encoding modes,

is the adjusted Lagrange parameter.

（2-2）进行视点间的预测是由于本方法是编码多个视点，通过视点间的对应帧进行预测当前帧，可以去除视点间的冗余信息。 (2-2) Inter-viewpoint prediction is performed because this method encodes multiple viewpoints, and predicts the current frame through corresponding frames between viewpoints, which can remove redundant information between viewpoints.

（2-3）比较视点间和视点内的编码代价，在视点内预测中选择最佳的预测方式再和视点间的预测方式比较，选择率失真优化代价函数最小的预测方式为最佳预测方式。充分考虑视点间和视点内的冗余特性，选择合适的预测方式进一步提高视频压缩效率。 (2-3) Compare the encoding cost between views and within views, select the best prediction method in intra-view prediction and compare it with the prediction method between views, and choose the prediction method with the smallest rate-distortion optimization cost function as the best prediction method . Fully consider the redundant characteristics between views and within views, and select the appropriate prediction method to further improve the video compression efficiency.

上述步骤(3)对残差数据进行离散余弦变换，参见图4，其具体步骤如下： Above-mentioned step (3) carries out discrete cosine transform to residual data, referring to Fig. 4, its concrete steps are as follows:

(3-1)编码块大小的判决，在多视点编码方法中编码块大小有

七种情况，前四种归结为

变换块，后三种为变换块。 (3-1) Judgment of the coding block size, in the multi-view coding method, the coding block size has

Of the seven cases, the first four boil down to

transform block, the last three are Transform blocks.

(3-2)对应的DCT变换，对于变换块采用

DCT变换，对于

变换块采用

DCT变换。 (3-2) The corresponding DCT transformation, for The transform block uses

DCT transform, for

The transform block uses

DCT transformation.

上述步骤(4)动态调节当前帧中每个宏块的量化步长，参见图5,其具体步骤如下： Above-mentioned step (4) dynamically adjusts the quantization step size of each macroblock in the current frame, referring to Fig. 5, its specific steps are as follows:

（4-1）通过已建立的JND模型，求出当前帧的平均JND值，平均JND阈值为： (4-1) Calculate the average JND value of the current frame through the established JND model, and the average JND threshold is:

其中，

和

分别表示图像帧的高度和宽度，

表示当前帧的恰可辨失真阈值，

表示像素的坐标。 in,

and

represent the height and width of the image frame, respectively,

Indicates the just discernible distortion threshold of the current frame,

Represents the coordinates of a pixel.

(4-2)当前宏块的JND均值，第M个宏块的平均JND阈值表达为： (4-2) The JND mean value of the current macroblock, the average JND threshold value of the Mth macroblock is expressed as:

(4-3)动态调节当前宏块的量化步长，恰可辨失真阈值反映了人眼对一幅图像各个部分的敏感度的不同，因此可以根据恰可辨失真阈值的不同来动态调节各宏块的量化步长。对于人眼不敏感的地方，将量化步长适当的调大，否则，量化值不变。提出的量化参数调节为： (4-3) Dynamically adjust the quantization step size of the current macroblock. The just discernible distortion threshold reflects the sensitivity of the human eye to each part of an image, so each part can be dynamically adjusted according to the difference of the just discernible distortion threshold. The quantization step size of the macroblock. For places where the human eye is not sensitive, the quantization step size should be increased appropriately, otherwise, the quantization value will remain unchanged. The proposed quantization parameter tuning is:

其中，

是编码框架原有的步长，

为调节因子，其表达式由下式给出： in,

is the original step size of the encoding frame,

is the adjustment factor whose expression is given by:

其中，

。 in,

.

上述步骤(5)动态调节率失真优化过程中的拉格朗日参数,参见图6，其具体操作步骤如下： The above step (5) dynamically adjusts the Lagrangian parameters in the rate-distortion optimization process, see Figure 6, and its specific operation steps are as follows:

（5-1）计算并比较当前帧的JND均值和当前编码宏块的JND均值，为下一步对拉格朗日参数的加权提供依据。 (5-1) Calculate and compare the average JND value of the current frame and the average JND value of the current coded macroblock to provide a basis for the weighting of the Lagrangian parameters in the next step.

（5-2）调整朗格朗日参数，前面调节了量化参数，拉格朗日率失真优化中的失真值和码率发生变化，此时再用原有的拉格朗日参数值，就不能保证是最优解。同时对应加权拉格朗日参数，能使代价函数重新达到最优，调整后的

为： (5-2) Adjust the Langrange parameters. The quantization parameters have been adjusted before, and the distortion value and bit rate in the Lagrangian rate-distortion optimization will change. At this time, the original Lagrangian parameters will be used. value, it cannot be guaranteed to be the optimal solution. At the same time, corresponding to the weighted Lagrangian parameters, the cost function can be optimized again, and the adjusted

for:

其中，

表示多视点编码方法内生成的量化参数，

表示第个宏块调整后的量化参数值。 in,

denotes the quantization parameter generated within the multi-view coding method,

Indicates the first The adjusted quantization parameter value of each macroblock.

（5-3）将调整后的拉格朗日参数代入到率失真优化代价函数中，其表达式如下： (5-3) Substituting the adjusted Lagrangian parameters into the rate-distortion optimization cost function, the expression is as follows:

其中

为失真信号，

为不同编码模式下编码的比特数，

是调整后的拉格朗日参数。这样使得在量化参数改变的同时，相应改变拉格朗日参数，使得率失真优化函数依然得到最优解。 in

is a distorted signal,

is the number of bits encoded in different encoding modes,

is the adjusted Lagrange parameter. In this way, when the quantization parameter is changed, the Lagrangian parameter is correspondingly changed, so that the rate-distortion optimization function still obtains an optimal solution.

上述步骤(6)对量化的数据进行熵编码，形成码流通过网络传输，参见图7，其具体步骤如下： The above step (6) entropy-encodes the quantized data to form a code stream for transmission through the network, see Figure 7, the specific steps are as follows:

（6-1）对量化的数据进行熵编码，这样使得量化的数据能被二进制码流最有效的表示，去除了量化数据的统计冗余。 (6-1) Entropy encoding is performed on the quantized data, so that the quantized data can be most effectively represented by the binary code stream, and the statistical redundancy of the quantized data is removed.

（6-2）将熵编码形成的码流通过网络传输，实现视频的传输。在经过视觉感知特性处理的编码方法由于其占用带宽小，能够更好的适应网络传输。 (6-2) Transmit the code stream formed by entropy coding through the network to realize video transmission. The encoding method processed by visual perception characteristics can better adapt to network transmission because of its small bandwidth occupation.

下面进行大量仿真实验来评估本文所提出的利用视觉特性的多视点视频编码方法的性能。在配置为Intel Pentium 4 CPU 3.00GHz, 512M Internal Memory, Intel 8254G Express Chipset Family, Windows XP Operation System的PC机上编解码多视点视频序列ballroom、race1、crowd的前48帧，其中，BASIC QP设为20,24,28,32，实验平台选用多视点视频编解码参考软件JMVC，编解码预测结构选用HHI-IBBBP，视点间预测方式采用双向预测方式。 A large number of simulation experiments are carried out below to evaluate the performance of the multi-view video coding method proposed in this paper using visual characteristics. On a PC configured with Intel Pentium 4 CPU 3.00GHz, 512M Internal Memory, Intel 8254G Express Chipset Family, and Windows XP Operation System, the first 48 frames of multi-viewpoint video sequences ballroom, race1, and crowd were encoded and decoded, among which BASIC QP was set to 20 , 24, 28, 32. The multi-view video codec reference software JMVC was selected as the experimental platform, the codec prediction structure was selected as HHI-IBBBP, and the inter-viewpoint prediction method was bi-directional prediction.

视频序列ballroom的实验结果如图8a～8b、图9所示。图8a是视频序列ballroom在量化参数QP=24的情况下，第0个视点第15帧图像使用JMVC原始编码方法的重建图像，重建视频图像的PSNR=40.31dB。图8b是视频序列ballroom在量化参数QP=24的情况下，第0个视点第15帧图像使用本发明方法的重建视频图像，重建视频图像的PSNR=40.10dB。图9是视频序列ballroom使用JMVC原始编码和本发明两种方法，在不同QP和不同视点的情况下，码率、PSNR值、码率节省百分比、重建视频主观质量评价分数差（DM0S）、平均码率节省百分比的统计结果。可以看出，视频序列ballroom在不同QP下，使用本发明方法的编码码率比使用JMVC原始编码方法的编码码率节省了7.47%～9.16%，JMVC原始编码方法和本发明方法的视频主观质量评价分数差为0.03～0.07，可以认为主观质量保持不变。 The experimental results of the video sequence ballroom are shown in Figures 8a-8b and Figure 9. Figure 8a is the reconstructed image of the video sequence ballroom in the case of the quantization parameter QP=24, the 15th frame image of the 0th viewpoint using the JMVC original coding method, and the PSNR of the reconstructed video image is 40.31dB. Fig. 8b is the reconstructed video image of the 15th frame image of the 0th viewpoint using the method of the present invention in the case of the quantization parameter QP=24 of the video sequence ballroom, and the PSNR of the reconstructed video image is 40.10dB. Figure 9 shows the video sequence ballroom using JMVC original encoding and the two methods of the present invention, in the case of different QP and different viewpoints, bit rate, PSNR value, bit rate saving percentage, reconstructed video subjective quality evaluation score difference (DMOS), average Statistical result of bit rate saving percentage. It can be seen that, under different QPs for the video sequence ballroom, the encoding rate using the method of the present invention is 7.47% to 9.16% lower than that using the original encoding method of JMVC, and the subjective video quality of the original encoding method of JMVC and the method of the present invention are The evaluation score difference is 0.03-0.07, and it can be considered that the subjective quality remains unchanged.

视频序列race1的实验结果如图10a～10b、图11所示。图10a是视频序列race1在量化参数QP=24的情况下，第1个视点第25帧图像使用JMVC原始编码方法的重建视频图像，重建视频图像的PSNR=41.15dB。图10b是视频序列race1在量化参数QP=24的情况下，第1个视点第36帧图像使用JMVC原始编码方法的重建视频图像，重建视频图像的PSNR=40.51dB。图11是视频序列race1使用JMVC原始编码和本发明两种方法，在不同QP和不同视点的情况下，码率、PSNR值、码率节省百分比、重建视频主观质量评价分数差（DM0S）、平均码率节省百分比的统计结果。可以看出，视频序列race1在不同QP下，使用本发明方法的编码码率比使用JMVC原始编码方法的编码码率节省了10.77%～12.35%，JMVC原始编码方法和本发明方法的视频主观质量评价分数差为0.06～0.09，可以认为主观质量保持不变。 The experimental results of the video sequence race1 are shown in Figures 10a-10b and Figure 11. Figure 10a is the reconstructed video image of the 25th frame image of the first viewpoint using the JMVC original coding method in the case of the video sequence race1 with the quantization parameter QP=24, and the PSNR of the reconstructed video image is 41.15dB. Figure 10b shows the reconstructed video image of the 36th frame image of the first viewpoint using the JMVC original coding method in the case of the video sequence race1 with the quantization parameter QP=24, and the PSNR of the reconstructed video image is 40.51dB. Figure 11 shows the video sequence race1 using JMVC original encoding and the two methods of the present invention, in the case of different QP and different viewpoints, the bit rate, PSNR value, bit rate saving percentage, reconstructed video subjective quality evaluation score difference (DMOS), average Statistical result of bit rate saving percentage. It can be seen that, under different QPs for the video sequence race1, the coding rate using the method of the present invention is 10.77% to 12.35% lower than the coding rate using the JMVC original coding method, and the subjective video quality of the JMVC original coding method and the inventive method The evaluation score difference is 0.06-0.09, and it can be considered that the subjective quality remains unchanged.

视频序列crowd的实验结果如图12a～12b、图13所示。图12a是视频序列crowd在量化参数QP=35的情况下，第2个视点第45帧图像使用JMVC原始编码方法的重建视频图像，重建视频图像的PSNR=33.77dB。图12b是视频序列crowd在量化参数QP=35的情况下，第2个视点第45帧图像使用JMVC原始编码方法的重建视频图像，重建视频图像的PSNR=33.12dB。图13是视频序列crowd使用JMVC原始编码和本发明两种方法，在不同QP和不同视点的情况下，码率、PSNR值、码率节省百分比、重建视频主观质量评价分数差（DM0S）、平均码率节省百分比的统计结果。可以看出，视频序列crowd在不同QP下，使用本发明方法的编码码率比使用JMVC原始编码方法的编码码率节省了8.95%～9.83%，JMVC原始编码方法和本发明方法的视频主观质量评价分数差为0.03～0.08，可以认为主观质量保持不变。 The experimental results of video sequence crowd are shown in Figures 12a-12b and Figure 13. Fig. 12a is the reconstructed video image of the 45th frame image of the second viewpoint using the JMVC original coding method in the case of the video sequence crowd with the quantization parameter QP=35, and the PSNR of the reconstructed video image is 33.77dB. Figure 12b shows the reconstructed video image of the 45th frame image of the second viewpoint using the JMVC original coding method when the quantization parameter QP=35 of the video sequence crowd, and the PSNR of the reconstructed video image is 33.12dB. Figure 13 shows the video sequence crowd using JMVC original encoding and the two methods of the present invention, in the case of different QP and different viewpoints, bit rate, PSNR value, bit rate saving percentage, reconstructed video subjective quality evaluation score difference (DMOS), average Statistical result of bit rate saving percentage. It can be seen that the video sequence crowd is under different QPs, and the encoding rate using the method of the present invention is 8.95% to 9.83% lower than that using the JMVC original encoding method. The subjective video quality of the JMVC original encoding method and the inventive method The evaluation score difference is 0.03 to 0.08, and it can be considered that the subjective quality remains unchanged.

结合以上各图表可以看出，本发明通过建立DCT域的JND模型，并将其运用到多视点视频编码框架量化过程和率失真优化过程，在保证主观质量不变的情况下，大幅度降低多视点视频编码码率，提高了多视点视频编码的压缩效率。 Combining the above charts, it can be seen that the present invention establishes a JND model in the DCT domain and applies it to the quantization process and rate-distortion optimization process of the multi-viewpoint video coding framework. Viewpoint video coding bit rate, which improves the compression efficiency of multi-viewpoint video coding.

Claims

1. A method utilizing visual perception characteristics to guide the multi-viewpoint video coding quantization process, characterized in that the steps of operation are as follows:

(1) Read the brightness value of each frame of the input video sequence, and establish a just discernable distortion threshold model in the frequency domain,

(2) Each frame of the input video sequence undergoes intra-viewpoint and inter-viewpoint prediction,

(3) Discrete cosine transform is performed on the residual data,

(4) Dynamically adjust the quantization step size of each macroblock in the current frame,

(5) Lagrangian parameters in the dynamic adjustment rate-distortion optimization process,

(6) Entropy encoding is performed on the quantized data to form a code stream for transmission through the network.

2. the method for utilizing visual perception characteristics according to claim 1 to guide the multi-view coding quantization process, characterized in that said step (1) reads the size of the brightness value of each frame of the input video sequence, and establishes the accuracy of the frequency domain The steps to identify the distortion threshold model are as follows:

① Calculate the spatial sensitivity factors of 4x4 and 8x8DCT transformation according to the dimension of DCT transformation

, whose formula is:

where s is the control parameter,

is the angle of the frequency represented by the DCT coefficient vector,

is the DCT coefficient normalization factor,

is the spatial frequency, the parameters r, a, b, and c vary according to the size of the DCT transform: for a DCT coding format of 8×8 block size, is 0.6,

is 1.33,

is 0.11, is 0.18; for the DCT coding format of 4×4 block size,

is 0.6,

is 0.8,

is 0.035, is 0.008;

② According to the experimental results, the brightness masking effect of the human eye under different background brightness conditions The curves are represented as follows:

in, is the average pixel value of the current coding block;

③ Use the edge detector to detect the texture characteristics of the current coding block, and find the texture masking factor

, whose expression is as follows:

in,

Represents the horizontal and vertical coordinate coefficients of the transformation block,

Denotes the contrast mask estimator, is the spatial sensitivity factor,

is the DCT transform coefficient of the nth coding block of the current frame;

④ According to the speed of object movement in each frame of the video sequence, the time-domain masking effect factor is experimentally measured

The expression is:

in,

is the spatial frequency,

is the time frequency;

⑤ The weighted product of the four factors obtained in steps ①~④ constitutes the just discernable distortion threshold of the current coded frame.

3. the method for utilizing visual perception characteristics according to claim 1 to guide the multi-view coding quantization process, characterized in that each frame of said step (2) input video sequence is through the operation steps of prediction in the viewpoint and between the viewpoints as follows:

① Perform inter-frame and intra-frame prediction within the viewpoint, compare the predicted value with the current frame to be encoded, and select an encoding method with a lower encoding cost;

② For inter-view prediction, the current encoded frame of the current view is predicted based on the corresponding frame of the reference view, and the predicted value is compared with the corresponding frame of the reference view to obtain the encoding cost of inter-view prediction;

③ Compare the coding cost between views and within the view, and choose the prediction mode with smaller coding cost.

4. The method for utilizing visual perception characteristics to guide the multi-view coding quantization process according to claim 1, wherein said step (3) is characterized in that the operation steps of discrete cosine transform of residual data are as follows:

① Judgment on the size of the coding block, when the length of any side of the coding block is less than 8, it is classified as a 4x4 transformation block, otherwise, it is an 8x8 transformation block;

② When it is a 4x4 transformation block, select 4x4 DCT transformation, and when it is an 8x8 transformation block, select 8x8DCT transformation.

5. the method for utilizing visual perception characteristics to guide the multi-view coding quantization process according to claim 1, characterized in that said step (4) dynamically adjusts the operation steps of the quantization step size of each macroblock in the current frame as follows:

① Calculate the average value of the just discernible distortion threshold of the current frame;

② Calculate the average value of the just discernible distortion threshold of the current coded macroblock;

③ Compare the mean value of the just discernible distortion threshold of each frame with the mean value of the just discernable distortion threshold of the current macroblock, and dynamically adjust the quantization step of the current macroblock. The expression of the adjusted quantization step is as follows:

in, Indicates the original quantization step size of the encoding frame,

Indicates the mean value of the just discernible distortion threshold of the current macroblock,

Indicates the mean value of the just discernable distortion threshold for the current frame, is the adjustment factor.

6. The method according to claim 1 in which visual perception characteristics are used to guide the multi-view coding quantization process, wherein the operation steps of the step (5) dynamically adjusting the Lagrangian parameters in the rate-distortion optimization process are as follows:

① Compare the mean value of the just discernible distortion threshold of each frame with the mean value of the just discernable distortion threshold of the current macroblock;

② Adjust the Lagrangian parameters, the expression of the adjusted Lagrangian parameters is:

in is the adjustment factor,

is the adjusted quantization step size,

Indicates the original quantization step size of the encoding frame,

Indicates the mean value of the just discernible distortion threshold of the current macroblock, Indicates the mean value of the just discernable distortion threshold of the current frame;

③ The optimization of the encoding cost function dynamically adjusts the Lagrangian parameters, so that the rate-distortion optimization function can regain the optimal solution when the quantization step size is changed; the expression is:

in

is a distorted signal,

is the number of bits encoded in different encoding modes,

is the adjusted Lagrange parameter.

7. The method according to claim 1 utilizing visual perception characteristics to guide the multi-view coding quantization process, characterized in that said step (6) carries out entropy coding to the quantized data, and the operation steps of forming a code stream for transmission through the network are as follows:

① The quantized data is entropy encoded, so that the quantized data forms a binary code stream;

② The encoded code stream is transmitted through the network.