CN111263163A - Method for realizing depth video compression framework based on mobile phone platform - Google Patents

Method for realizing depth video compression framework based on mobile phone platform Download PDF

Info

Publication number
CN111263163A
CN111263163A CN202010104794.1A CN202010104794A CN111263163A CN 111263163 A CN111263163 A CN 111263163A CN 202010104794 A CN202010104794 A CN 202010104794A CN 111263163 A CN111263163 A CN 111263163A
Authority
CN
China
Prior art keywords
video compression
network
net
frame
pruning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010104794.1A
Other languages
Chinese (zh)
Inventor
冯落落
李锐
乔廷慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Hi Tech Investment and Development Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co Ltd filed Critical Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN202010104794.1A priority Critical patent/CN111263163A/en
Publication of CN111263163A publication Critical patent/CN111263163A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a method for realizing a depth video compression framework based on a mobile phone platform, which belongs to the fields of image classification, target detection, face recognition and the like, and comprises the following steps: s1, building the whole video compression network, training the model by using videos of a plurality of different scenes to obtain a trained large network, and storing the graph model and parameter information of the network; s2, pruning and quantifying the trained model; s3, pruning and quantization are performed for each layer, and the weights in the entire network are huffman coded using huffman coding and then stored. Under the condition of low precision loss, the depth video compression model is compressed by utilizing pruning, quantization and Huffman coding, so that the model is about 1/100 times of the original model, and a video compression framework based on depth learning can be conveniently deployed in mobile phone equipment.

Description

一种基于手机平台的深度视频压缩框架的实现方法A Implementation Method of Deep Video Compression Framework Based on Mobile Phone Platform

技术领域technical field

本发明涉及图像分类、目标检测、人脸识别等领域,具体地说是一种基于手机平台的深度视频压缩框架的实现方法。The invention relates to the fields of image classification, target detection, face recognition and the like, in particular to an implementation method of a deep video compression framework based on a mobile phone platform.

背景技术Background technique

如今,视频成为大众进行信息传播的主要媒介。尤其是自媒体的发展,视频数据呈爆发式的增长。基于深度学习的视频压缩方法目前已经成为最近研究的主流方向。基于深度学习的视频压缩方法已经成为目前的主流方法的H.264和H.265的有力竞争者。Today, video has become the main medium for the public to disseminate information. Especially with the development of self-media, video data is growing explosively. Video compression methods based on deep learning have become the mainstream direction of recent research. The video compression method based on deep learning has become a strong competitor of the current mainstream methods H.264 and H.265.

但是基于深度学习的视频压缩方法往往参数量非常大,由于手机设备往往存储量和计算力受限,所以根本无法部署到手机设备中,因此如何对部署到手机中的深度学习视频压缩算法进行压缩,成为了关键问题。However, video compression methods based on deep learning often have a large amount of parameters. Because mobile devices are often limited in storage and computing power, they cannot be deployed to mobile devices at all. Therefore, how to compress the deep learning video compression algorithm deployed in mobile phones? , became the key issue.

发明内容SUMMARY OF THE INVENTION

本发明的技术任务是解决现有深度学习视频压缩框架,非常大,很难部署到手机等嵌入式设备中的不足,提供一种基于手机平台的深度视频压缩框架的实现方法。本发明在精度损失不大的情况下,利用剪枝、量化和霍夫曼编码对深度视频压缩模型进行压缩,从而使得基于深度学习的视频压缩框架部署到手机中。The technical task of the present invention is to solve the shortcomings of the existing deep learning video compression framework, which is very large and difficult to deploy in embedded devices such as mobile phones, and provides an implementation method of a deep video compression framework based on a mobile phone platform. The present invention utilizes pruning, quantization and Huffman coding to compress the deep video compression model under the condition of little loss of precision, so that the video compression framework based on deep learning is deployed in mobile phones.

本发明解决其技术问题所采用的技术方案是:The technical scheme adopted by the present invention to solve its technical problems is:

本专利主要提出利用剪枝、量化、霍夫曼编码把表现优异的基于深度学习的视频压缩框架部署到手机平台上。This patent mainly proposes to use pruning, quantization, and Huffman coding to deploy an excellent deep learning-based video compression framework on a mobile phone platform.

1、一种基于手机平台的深度视频压缩框架的实现方法,该方法的实现步骤如下:1. An implementation method of a deep video compression framework based on a mobile phone platform, the implementation steps of the method are as follows:

S1、搭建整个视频压缩网络,使用多个不同场景的视频进行模型的训练,然后使用5000多个不同场景的视频进行模型的训练,一共迭代100万次,获得一个训练好的大网络,然后把网络的图模型和参数信息进行保存;S1. Build the entire video compression network, use multiple videos of different scenes to train the model, and then use more than 5,000 videos of different scenes to train the model, iterate a total of 1 million times to obtain a large trained network, and then put The graph model and parameter information of the network are saved;

S2、然后把训练好的模型进行剪枝和量化处理;S2, and then prune and quantify the trained model;

S3、剪枝和量化都是对每一层分别进行,为了进一步减少存储,使用霍夫曼编码对整个网络中的权重进行霍夫曼编码,然后进行存储。S3, pruning and quantization are performed separately for each layer. In order to further reduce storage, Huffman encoding is used to Huffman the weights in the entire network, and then stored.

方案优选地,步骤1中利用tensorflow框架搭建的视频压缩网络,包括opticalFlow net、MV Encoder net、MV Decoder net、Motion Compensation Net、Residualencoder net、Residual decoder net这6个网络,工作过程如下:Scheme Preferably, the video compression network built using the tensorflow framework in step 1 includes six networks of opticalFlow net, MV Encoder net, MV Decoder net, Motion Compensation Net, Residualencoder net, and Residual decoder net. The working process is as follows:

S101、将视频拆分成每一帧图片,输入当前帧和上一重构帧到光流网络Optical FlowNet,获得当前帧的运动向量;S101, split the video into each frame of pictures, input the current frame and the last reconstructed frame to the optical flow network Optical FlowNet, and obtain the motion vector of the current frame;

S102、然后把运动向量通过运动向量编码网络MV Encoder Net进行编码,获得编码后的结果,S102, then encode the motion vector through the motion vector encoding network MV Encoder Net to obtain the encoded result,

S103、再对编码后的结果进行量化Q得到量化后的结果,作为当前帧所需要存储的内容之一;S103, quantize the encoded result again to obtain the quantized result, as one of the contents that the current frame needs to store;

S104、把通过运动向量解码网络MV Decoder Net后的结果即当前帧的重构运动向量,和上一重构帧的图片输入到运动补偿网络Motion compensation Net 获得当前帧的预测帧;S104, the result after passing through the motion vector decoding network MV Decoder Net, that is, the reconstructed motion vector of the current frame, and the picture of the previous reconstructed frame are input into the motion compensation network Motion compensation Net to obtain the predicted frame of the current frame;

S105、使用真实帧和和预测帧进行相减,获得预测帧没能包括的残差信息rtS105, using the real frame and the predicted frame to subtract, to obtain residual information r t that the predicted frame failed to include;

S106、对残差信息进行编码Residual encoder net、量化Q、熵编码存储,然后解码Residual decoder net获得残差的的重构结果,然后和预测帧相加获得最终的重构帧;S106, encode Residual encoder net, quantize Q, and entropy encode and store the residual information, then decode the Residual decoder net to obtain the reconstruction result of the residual, and then add it to the predicted frame to obtain the final reconstructed frame;

S107、压缩完的视频需要保存步骤S103量化后的运动向量的编码和步骤S106量化后的残差编码。S107 , the compressed video needs to save the encoding of the motion vector after quantization in step S103 and the residual encoding after quantization in step S106 .

方案优选地,步骤2中步骤包括如下:Scheme Preferably, in step 2, step comprises as follows:

S201、首先是剪枝,通过对每层训练好的权重进行可视化,把绝对值小于0.5的数据全部剪掉,从而得到一个稀疏矩阵,对所得的稀疏矩阵进行存储,S201. The first is pruning. By visualizing the trained weights of each layer, all the data whose absolute value is less than 0.5 are pruned to obtain a sparse matrix, and the obtained sparse matrix is stored.

把索引这个绝对位置存储的值,改为使用相对值diff,diff表示的是当前值距离上一个值的偏移量,设置最大的偏移量为8,这样就会使用3个bit存储每个偏移量,另外在12那个位置补充一个数0,使得idx为15的时候,偏移量为3;Change the value stored at the absolute position of the index to use the relative value diff. The diff represents the offset of the current value from the previous value. Set the maximum offset to 8, so that 3 bits will be used to store each value. Offset, and add a number 0 at the position of 12, so that when idx is 15, the offset is 3;

S202、剪枝完之后,对剪枝完后的数据进行量化。S202. After the pruning is completed, quantify the data after the pruning.

方案优选地,步骤S201中,使用CSR进行矩阵的存储。Solution Preferably, in step S201, the CSR is used to store the matrix.

方案优选地,步骤S202中,使用传统的K-mean算法进行矩阵的量化。Solution Preferably, in step S202, the traditional K-mean algorithm is used to perform matrix quantization.

方案优选地,步骤S202中,K-mean算法具体如下:Scheme Preferably, in step S202, the K-mean algorithm is specifically as follows:

首先进行K-means中初始值的选择,然后进行采样,使用的K为11,就是选择11个点;First, select the initial value in K-means, and then perform sampling. The K used is 11, that is, 11 points are selected;

然后使用K-mean算法进行训练,获得最终的11个中心点,然后把数据聚类到相应的簇中,假设使用的是K=4,然后使用K-mean聚类,分别得到每个数据的簇,分别得到4个簇中心,然后只需要存储这4个数,每个数据索引我们也要存储一下;Then use the K-mean algorithm for training to obtain the final 11 center points, and then cluster the data into the corresponding clusters, assuming that K=4 is used, and then use K-mean clustering to obtain each data. Cluster, get 4 cluster centers respectively, and then only need to store these 4 numbers, we also need to store each data index;

当量化完之后,需要对模型进行一下调优,分别对每个参数反向求导,然后把每个簇的导数相加,然后利用这个加和梯度,对量化后的参数进行梯度下降——param-lr*gradient。After the quantization is completed, the model needs to be tuned a bit, each parameter is reversely derived, and then the derivatives of each cluster are added, and then the quantized parameters are gradient descent using this summed gradient—— param-lr*gradient.

方案优选地,K-means中初始值的选择,使用基于数据密度的方法,就是根据数据出现的频率作为选择的概率,然后进行采样。Scheme Preferably, the selection of the initial value in K-means uses a method based on data density, that is, the frequency of data occurrence is used as the probability of selection, and then sampling is performed.

本发明的一种基于手机平台的深度视频压缩框架的实现方法与现有技术相比所产生的有益效果是:Compared with the prior art, the beneficial effects of a method for implementing a deep video compression framework based on a mobile phone platform of the present invention are:

本发明在精度损失不大的情况下,利用剪枝、量化和霍夫曼编码对深度视频压缩模型进行压缩,使得模型是原来的1/100倍左右,从而可以将基于深度学习的视频压缩框架很方便的部署到手机设备中。The present invention uses pruning, quantization and Huffman coding to compress the deep video compression model under the condition that the loss of precision is not large, so that the model is about 1/100 times of the original, so that the video compression framework based on deep learning can be It is very convenient to deploy to mobile devices.

附图说明Description of drawings

为了更清楚地描述本发明自动喷雾结合捕尘网的工作原理,下面将附上简图作进一步说明。In order to more clearly describe the working principle of the automatic spraying combined with the dust-catching net of the present invention, a schematic diagram will be attached for further explanation below.

附图1是本发明所使用的的深度学习视频压缩框架的示意图;Accompanying drawing 1 is the schematic diagram of the deep learning video compression framework used in the present invention;

附图2是本发明索引数字存储的示意图;Accompanying drawing 2 is the schematic diagram of index number storage of the present invention;

附图3是本发明使用K-mean算法进行矩阵量化的示意图;Accompanying drawing 3 is the schematic diagram that the present invention uses K-mean algorithm to carry out matrix quantization;

附图4是本发明CSR稀疏矩阵的存储图。FIG. 4 is a storage diagram of the CSR sparse matrix of the present invention.

图中各标号表示:The symbols in the figure represent:

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

请参阅图1-3,本发明的一种基于手机平台的深度视频压缩框架的实现方法,该方法的实现步骤如下:Please refer to Fig. 1-3, a method for implementing a deep video compression framework based on a mobile phone platform of the present invention, the implementation steps of the method are as follows:

S1、利用tensorflow框架,搭建整个视频压缩网络,包括如图1所示的optical Flownet、MV Encoder net、MV Decoder net、Motion Compensation Net、Residual encodernet、Residual decoder net这6个网络,然后我们使用5000多个不同场景的视频进行模型的训练,一共迭代100万次,获得一个训练好的大网络。然后把网络的图模型和参数信息进行保存。S1. Use the tensorflow framework to build the entire video compression network, including optical Flownet, MV Encoder net, MV Decoder net, Motion Compensation Net, Residual encodernet, Residual decoder net as shown in Figure 1. These 6 networks, and then we use more than 5,000 Videos of different scenes are used to train the model, and a total of 1 million iterations are performed to obtain a trained large network. Then save the graph model and parameter information of the network.

图1中视频压缩网络的工作过程如下:The working process of the video compression network in Figure 1 is as follows:

S101、将视频拆分成每一帧图片,输入当前帧和上一重构帧到光流网络Optical FlowNet,获得当前帧的运动向量;S101, split the video into each frame of pictures, input the current frame and the last reconstructed frame to the optical flow network Optical FlowNet, and obtain the motion vector of the current frame;

S102、然后把运动向量通过运动向量编码网络MV Encoder Net进行编码,获得编码后的结果,S102, then encode the motion vector through the motion vector encoding network MV Encoder Net to obtain the encoded result,

S103、再对编码后的结果进行量化Q得到量化后的结果,作为当前帧所需要存储的内容之一;S103, quantize the encoded result again to obtain the quantized result, as one of the contents that the current frame needs to store;

S104、把通过运动向量解码网络MV Decoder Net后的结果即当前帧的重构运动向量,和上一重构帧的图片输入到运动补偿网络Motion compensation Net 获得当前帧的预测帧;S104, the result after passing through the motion vector decoding network MV Decoder Net, that is, the reconstructed motion vector of the current frame, and the picture of the previous reconstructed frame are input into the motion compensation network Motion compensation Net to obtain the predicted frame of the current frame;

S105、使用真实帧和和预测帧进行相减,获得预测帧没能包括的残差信息rtS105, using the real frame and the predicted frame to subtract, to obtain residual information r t that the predicted frame failed to include;

S106、对残差信息进行编码Residual encoder net、量化Q、熵编码存储,然后解码Residual decoder net获得残差的的重构结果,然后和预测帧相加获得最终的重构帧;S106, encode Residual encoder net, quantize Q, and entropy encode and store the residual information, then decode the Residual decoder net to obtain the reconstruction result of the residual, and then add it to the predicted frame to obtain the final reconstructed frame;

S107、压缩完的视频需要保存步骤S103量化后的运动向量的编码和步骤S106量化后的残差编码。S107 , the compressed video needs to save the encoding of the motion vector after quantization in step S103 and the residual encoding after quantization in step S106 .

S2、然后把训练好的模型进行剪枝和量化处理;S2, and then prune and quantify the trained model;

然后把训练好的模型进行剪枝和量化处理,采用一层一层逐步进行剪枝量化,具体步骤如下:Then, the trained model is pruned and quantized, and the pruning and quantization are gradually performed layer by layer. The specific steps are as follows:

S201、首先是剪枝,我们通过对每层训练好的权重进行可视化,发现每一层的数据大部分数据的绝对值都很小,在0附近,因此我们把绝对值小于0.5的数据全部剪掉,这样我们会的到一个稀疏矩阵,我们使用常用CSR方法进行稀疏矩阵的存储,CSR进行矩阵的存储如图4所示。S201. The first is pruning. By visualizing the trained weights of each layer, we found that the absolute value of most of the data in each layer is very small, near 0, so we prune all the data whose absolute value is less than 0.5 In this way, we will get a sparse matrix. We use the common CSR method to store the sparse matrix, and CSR stores the matrix as shown in Figure 4.

如图4所示,有个3*3的稀疏矩阵,然后我们需要存储的信息为保留下来的值[1,2,3,4,5,6],和每个数据对应的列索引[0,2,2,0,1,2],由于我们是按照行进行存储,因此我们只能值列表中那几个数字属于一行,因此我们使用一个列表进行存储[0,2,3,6]。表示1,2属于一行,3属于一行,4,5,6属于一行。因此我们为了存储这个稀疏矩阵,一共需要存储2*a+n+1个数据,这要远远小于n*n。为了进行存储数据压缩,我们把索引[0,2,3,6]这个绝对位置存储的值,我们使用相对值,这样存储索引数字所需要的比特位就会减少。如图2所示,我们如果存储idx绝对值,一个数字所需要的bit位数为4个bits。但是如果我们存储的是diff,一个数字所需要的bit位数为3个bits。diff表示的是当前值距离上一个值的偏移量,为了使用更小的位数存储偏移量,这里我们设置最大的偏移量为8,这样就会使用3个bit存储每个偏移量,为了达到这种效果,我们在12那个位置补充一个数0,这样使得idx为15的时候,偏移量为3。As shown in Figure 4, there is a 3*3 sparse matrix, and then the information we need to store is the retained values [1, 2, 3, 4, 5, 6], and the column index corresponding to each data [0] , 2, 2, 0, 1, 2], since we store by row, we can only value those numbers in the list that belong to one row, so we use a list to store [0, 2, 3, 6] . Indicates that 1, 2 belong to a row, 3 belongs to a row, and 4, 5, and 6 belong to a row. Therefore, in order to store this sparse matrix, we need to store a total of 2*a+n+1 data, which is much smaller than n*n. In order to compress the stored data, we store the value at the absolute position of the index [0, 2, 3, 6], and we use the relative value, so that the number of bits required to store the index number will be reduced. As shown in Figure 2, if we store the absolute value of idx, the number of bits required for a number is 4 bits. But if we store the diff, the number of bits required for a number is 3 bits. diff represents the offset of the current value from the previous value. In order to use a smaller number of bits to store the offset, here we set the maximum offset to 8, which will use 3 bits to store each offset In order to achieve this effect, we add a number 0 at the position of 12, so that when the idx is 15, the offset is 3.

实验证明,通过剪枝我们能够让存储量减小到原来1/13,而且精度几乎没有损失。这样进一步证明了,深度学习的权重存在大量的冗余信息。Experiments show that by pruning, we can reduce the storage capacity to 1/13 of the original, and there is almost no loss in accuracy. This further proves that there is a lot of redundant information in the weights of deep learning.

S202、剪枝完之后,我们开始对剪枝完后的数据进行量化,这里我们使用传统的K-mean算法进行矩阵的量化。S202. After pruning, we start to quantify the pruned data. Here, we use the traditional K-mean algorithm to quantify the matrix.

如图3所示,首先进行K-means中初始值的选择,这里我们使用基于数据密度的方法,就是根据数据出现的频率作为选择的概率,然后进行采样。我们使用的K为11,就是选择11个点。As shown in Figure 3, the selection of the initial value in K-means is first performed. Here we use the method based on data density, that is, the probability of selection is based on the frequency of data occurrence, and then sampling is performed. The K we use is 11, which is to choose 11 points.

然后我们使用K-mean算法进行训练,获得最终的11个中心点,然后把数据聚类到相应的簇中。具体如图3所示,图中我们假设使用的是K=4,然后使用K-mean聚类,分别得到每个数据的簇,图中不同颜色表示不同的簇,如图中蓝色2.09、2.12、1.92、1.87为一个簇,我们把他们统一用数据2.00表示,2.00是4个值得平均得到的。同样对于其他3个簇,我们使用同样的方法进行处理,分别得到4个簇中心2.00、1.50、0.00、-1.00。Then we train using the K-mean algorithm to obtain the final 11 center points, and then cluster the data into corresponding clusters. Specifically, as shown in Figure 3, we assume that K=4 is used in the figure, and then use K-mean clustering to obtain the clusters of each data respectively. Different colors in the figure represent different clusters, such as blue 2.09, 2.12, 1.92, and 1.87 are a cluster, and we use the data 2.00 to represent them. 2.00 is the average of 4 values. Similarly for the other 3 clusters, we use the same method to process, and get 4 cluster centers 2.00, 1.50, 0.00, -1.00 respectively.

然后我们只需要存储这4个数,每个数据占用32个bit,每个数据索引我们也要存储一下,但是索引只需要存储2个bit,这样相比较16个数据都存为32个bit,我们量化后的模型存储量只是原来的5/16。Then we only need to store these 4 numbers, each data occupies 32 bits, we also need to store each data index, but the index only needs to store 2 bits, so compared to 16 data are stored as 32 bits, Our quantized model storage is only 5/16 of the original.

当我们量化完之后,我们需要对模型进行一下调优,这里我们分别对每个参数反向求导,然后把每个簇的导数相加,然后利用这个加和梯度,对量化后的参数进行梯度下降——param-lr*gradient,如图3所示。After we quantize, we need to tune the model. Here we take the reverse derivative of each parameter separately, then add the derivatives of each cluster, and then use this summed gradient to quantify the parameters. Gradient descent - param-lr*gradient, as shown in Figure 3.

S3、剪枝和量化都是对每一层分别进行,为了进一步减少存储,我们使用霍夫曼编码对整个网络中的权重进行霍夫曼编码,然后进行存储。S3, pruning and quantization are performed separately for each layer. In order to further reduce storage, we use Huffman encoding to Huffman encode the weights in the entire network, and then store them.

实验证明,我们压缩后的模型存储只占原模型的2/205。一个1G多的模型我们压缩到几十兆进行存储。可以很方便的部署到手机设备中。Experiments show that our compressed model storage only accounts for 2/205 of the original model. For a model with more than 1G, we compress it into tens of megabytes for storage. It can be easily deployed to mobile devices.

尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。While the preferred embodiments of the present application have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of this application.

显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

除说明书所述的技术特征外,均为本专业技术人员的已知技术。Except for the technical features described in the specification, they are all known technologies by those skilled in the art.

Claims (8)

1. A method for realizing a depth video compression framework based on a mobile phone platform is characterized by comprising the following steps:
s1, building the whole video compression network, training the model by using videos of a plurality of different scenes to obtain a trained large network, and storing the graph model and parameter information of the network;
s2, pruning and quantifying the trained model;
s3, pruning and quantization are performed for each layer, and the weights in the entire network are huffman coded using huffman coding and then stored.
2. The method for implementing a depth video compression framework based on a mobile phone platform according to claim 1, wherein the video compression network built by using a tensoflow framework in step S1 includes 6 networks, namely, optical Flow Net, mvencorder Net, MV Decoder Net, Motion compression Net, Residual encoder Net, and Residual Decoder Net.
3. The method for implementing a depth video compression framework based on a mobile phone platform as claimed in claim 2, wherein the step S1 is as follows:
s101, splitting the video into each frame of picture, inputting the current frame and the previous reconstructed frame to an Optical flow network Optical FlowNet, and obtaining a motion vector of the current frame;
s102, coding the motion vector through a motion vector coding network MV Encoder Net to obtain a coded result,
s103, quantizing the coded result to obtain a quantized result, wherein the quantized result is used as one of contents required to be stored in the current frame;
s104, inputting the result after passing through the Motion vector decoding network MV Decoder Net, namely the reconstructed Motion vector of the current frame and the picture of the previous reconstructed frame into a Motion compensation network Motion compensation Net to obtain a predicted frame of the current frame;
s105, subtracting the real frame and the predicted frame to obtain residual error information r which cannot be included in the predicted framet
S106, encoding Residual error encoder net, quantizing Q, entropy encoding and storing Residual error information, then decoding the Residual error encoder net to obtain a Residual error reconstruction result, and adding the Residual error reconstruction result and a predicted frame to obtain a final reconstruction frame;
s107, the compressed video needs to store the motion vector encoded in step S103 and the residual encoded in step S106.
4. The method for implementing a depth video compression framework based on a mobile phone platform according to claim 1, 2 or 3, wherein the step S2 includes the following steps:
s201, pruning is carried out firstly, all data with absolute value less than 0.5 are pruned by visualizing the trained weight of each layer, so as to obtain a sparse matrix, the obtained sparse matrix is stored,
the value stored by indexing this absolute position is changed to be the relative value diff which indicates the offset of the current value from the previous value, the maximum offset is set to be 8, thus 3 bits are used to store each offset, and in addition, the position of 12 is supplemented with a number of 0, so that when idx is 15, the offset is 3;
s202, after pruning is finished, quantizing the data after pruning is finished.
5. The method of claim 4, wherein in step S201, the CSR is used to store the matrix.
6. The method as claimed in claim 4, wherein in step S202, matrix quantization is performed using a conventional K-mean algorithm.
7. The method for implementing the depth video compression framework based on the mobile phone platform as claimed in claim 6, wherein in step S202, the K-mean algorithm is specifically as follows:
firstly, selecting an initial value in K-means, then sampling, wherein the used K is 11, namely 11 points are selected;
then training by using a K-mean algorithm to obtain final 11 central points, clustering data into corresponding clusters, assuming that K is 4, then using the K-mean clustering to respectively obtain clusters of each data, respectively obtaining 4 cluster centers, and then only storing the 4 data, wherein each data index also needs to be stored;
after the quantization is finished, the model needs to be optimized, each parameter is subjected to inverse derivation, the derivatives of each cluster are added, and then the quantized parameter is subjected to gradient reduction, param-lr gradient, by using the addition gradient.
8. The method of claim 1, 2, 3, 5, 6 or 7, wherein the initial value of K-means is selected by using a method based on data density, that is, the probability of selection is determined according to the frequency of occurrence of data, and then sampling is performed.
CN202010104794.1A 2020-02-20 2020-02-20 Method for realizing depth video compression framework based on mobile phone platform Pending CN111263163A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010104794.1A CN111263163A (en) 2020-02-20 2020-02-20 Method for realizing depth video compression framework based on mobile phone platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010104794.1A CN111263163A (en) 2020-02-20 2020-02-20 Method for realizing depth video compression framework based on mobile phone platform

Publications (1)

Publication Number Publication Date
CN111263163A true CN111263163A (en) 2020-06-09

Family

ID=70952978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010104794.1A Pending CN111263163A (en) 2020-02-20 2020-02-20 Method for realizing depth video compression framework based on mobile phone platform

Country Status (1)

Country Link
CN (1) CN111263163A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255576A (en) * 2021-06-18 2021-08-13 第六镜科技(北京)有限公司 Face recognition method and device
CN114898446A (en) * 2022-06-16 2022-08-12 平安科技(深圳)有限公司 Artificial intelligence-based face recognition method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090074073A1 (en) * 2003-07-18 2009-03-19 Microsoft Corporation Coding of motion vector information
CN108304928A (en) * 2018-01-26 2018-07-20 西安理工大学 Compression method based on the deep neural network for improving cluster
CN110009565A (en) * 2019-04-04 2019-07-12 武汉大学 A Lightweight Network-Based Super-Resolution Image Reconstruction Method
CN110166779A (en) * 2019-05-23 2019-08-23 西安电子科技大学 Video-frequency compression method based on super-resolution reconstruction
CN110753225A (en) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 Video compression method and device and terminal equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090074073A1 (en) * 2003-07-18 2009-03-19 Microsoft Corporation Coding of motion vector information
CN108304928A (en) * 2018-01-26 2018-07-20 西安理工大学 Compression method based on the deep neural network for improving cluster
CN110009565A (en) * 2019-04-04 2019-07-12 武汉大学 A Lightweight Network-Based Super-Resolution Image Reconstruction Method
CN110166779A (en) * 2019-05-23 2019-08-23 西安电子科技大学 Video-frequency compression method based on super-resolution reconstruction
CN110753225A (en) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 Video compression method and device and terminal equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255576A (en) * 2021-06-18 2021-08-13 第六镜科技(北京)有限公司 Face recognition method and device
CN114898446A (en) * 2022-06-16 2022-08-12 平安科技(深圳)有限公司 Artificial intelligence-based face recognition method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Dupont et al. Coin++: Neural compression across modalities
CN113747163B (en) Image coding, decoding and compression methods based on context reorganization modeling
US10939123B2 (en) Multi-angle adaptive intra-frame prediction-based point cloud attribute compression method
CN111294604B (en) Video compression method based on deep learning
WO2021164176A1 (en) End-to-end video compression method and system based on deep learning, and storage medium
RU2565877C2 (en) Method and apparatus for determining correlation between syntax element and codeword for variable length coding
CN115361559B (en) Image encoding method, image decoding method, device and storage medium
CN111246206B (en) Optical flow information compression method and device based on self-encoder
CN110248190B (en) Multilayer residual coefficient image coding method based on compressed sensing
CN111432211B (en) Residual error information compression method for video coding
CN116489369B (en) Driving digital video compression processing method
CN112149652A (en) Space-spectrum joint depth convolution network method for lossy compression of hyperspectral image
CN111263163A (en) Method for realizing depth video compression framework based on mobile phone platform
CN118474377A (en) Depth video coding and decoding method supporting multiple calculation complexity
CN118075472A (en) Spectrum compression method based on LOCO-I algorithm and Huffman coding
CN114882133B (en) Image encoding and decoding method, system, device and medium
CN114067258B (en) A Hierarchical Coding Method for Facial Call Video
JP2017158183A (en) Image processing device
Kumar et al. Vector quantization with codebook and index compression
CN116320458A (en) Distillation training method and system for deep learning image coding and decoding network
CN111652789B (en) Big data-oriented color image watermark embedding and extracting method
CN117714706A (en) Hyperspectral image compression method based on spectral embedding
CN109218726B (en) Laser-induced breakdown spectroscopy image lossy lossless joint compression method
CN115913248A (en) Live broadcast software development data intelligent management system
CN118972620B (en) Image decoding and encoding methods, apparatus, devices and storage media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200609