The present application is a national stage application in China, international patent application No. PCT/CN2020/071851, filed 1/13/2020, which in time claims priority and benefit from International patent application No. PCT/CN2019/071508 filed 1/13/2019. The entire disclosure of the foregoing application is incorporated by reference as part of the disclosure of this application.
Detailed Description
Various techniques are provided herein that may be used by a decoder of a video bitstream to improve the quality of decompressed or decoded digital video. Furthermore, the video encoder may also implement these techniques during the encoding process in order to reconstruct the decoded frames for further encoding.
For ease of understanding, section headings are used herein and the embodiments and techniques are not limited to the corresponding sections. Thus, embodiments from one section may be combined with embodiments from other sections.
1. Overview of the invention
The present invention relates to video encoding and decoding techniques. And more particularly to overlapped block motion compensation in video coding. It can be applied to existing video coding standards such as High Efficiency Video Coding (HEVC), or standards to be finalized (multi-function video coding). It is also applicable to future video codec standards or video codecs.
2. Background
Video codec standards have been developed primarily by developing the well-known ITU-T and ISO/IEC standards. ITU-T makes h.261 and h.263, ISO/IEC makes MPEG-1 and MPEG-4 video, and these two organizations together make h.262/MPEG-2 video and h.264/MPEG-4 Advanced Video Codec (AVC) and h.265/HEVC standards. Starting from h.262, the video codec standard is based on a hybrid video codec structure, where temporal prediction plus transform coding is utilized. To explore future video codec technologies beyond HEVC, VCEG and MPEG have jointly established a joint video exploration team (jfet) in 2015. Since then, jfet has adopted many new approaches and applied it to reference software known as the Joint Exploration Model (JEM). In month 4 2018, a joint video expert team (jfet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11 (MPEG) holds to aim at a multi-function video codec (VVC) standard that reduces 50% bit rate compared to HEVC.
Fig. 12 is a block diagram of an example implementation of a video encoder. Fig. 12 shows an encoder implementation with a built-in feedback path, where the video encoder also performs a video decoding function (reconstructing a compressed representation of video data for encoding of the next video data).
2.1 sub-CU based motion vector prediction
In a JEM with quadtree binary tree (QTBT) partitioning, each CU may have at most one set of motion parameters for each prediction direction. By dividing a large CU into sub-CUs and deriving motion information for all sub-CUs of the large CU, two sub-CU level motion vector prediction methods are considered in the encoder. An optional temporal motion vector prediction (ATMVP) method allows each CU to extract multiple sets of motion information from multiple blocks smaller than the current CU in the collocated reference picture. In the space-time motion vector prediction (STMVP) method, a motion vector of a sub-CU is recursively derived by using a time domain motion vector predictor and a spatial domain neighboring motion vector.
In order to maintain a more accurate motion field for sub-CU motion prediction, motion compression of the reference frame is currently disabled.
Fig. 1 is an example of ATMVP motion prediction of a CU.
2.1.1 optional temporal motion vector prediction
In the Alternative Temporal Motion Vector Prediction (ATMVP) method, the motion vector Temporal Motion Vector Prediction (TMVP) is modified by extracting multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than the current CU. Advanced Temporal Motion Vector Prediction (ATMVP) is also known as sub-block based temporal motion vector prediction (SbTMVP). As shown in fig. 1, the sub-CUs are square nxn blocks (default N is set to 4).
ATMVP predicts the motion vectors of sub-CUs within a CU in two steps. The first step is to identify the corresponding block in the reference picture with a so-called temporal vector. The reference picture is called a motion source picture. The second step is to divide the current CU into sub-CUs and obtain a reference index and a motion vector of each sub-CU from the corresponding block of each sub-CU, as shown in fig. 1.
In a first step, the reference picture and the corresponding block are determined from motion information of spatial neighboring blocks of the current CU. To avoid the duplicate scan process of neighboring blocks, the first Merge candidate in the Merge candidate list of the current CU is used. The first available motion vector and its associated reference index are set to the temporal vector and the index to the motion source picture. In this way, in an ATMVP, the corresponding block (sometimes referred to as a collocated block) may be more accurately identified than a TMVP, with the corresponding block always located in a lower right or center position relative to the current CU.
In a second step, corresponding blocks of the sub-CU are identified by temporal vectors in the motion source picture by adding the temporal vectors to the coordinates of the current CU. For each sub-CU, the motion information of its corresponding block (the smallest motion grid covering the center samples) is used to derive the motion information of the sub-CU. After identifying the motion information of the corresponding nxn block, it is converted into a motion vector and a reference index of the current sub-CU, as in the TMVP method of HEVC, in which motion scaling and other processing are applied. For example, the decoder checks whether a low delay condition is satisfied (e.g., POC of all reference pictures of the current picture is smaller than picture order count POC of the current picture), and may use a motion vector MVx (motion vector corresponding to reference picture list X) to predict a motion vector MVy for each sub-CU (X equals 0 or 1 and Y equals 1-X).
2.1.2 spatial motion vector prediction
In this approach, the motion vectors of the sub-CUs are recursively derived in raster scan order. Fig. 2 illustrates this concept. Consider an 8 x 8 CU that contains four 4 x 4 sub-CUs a, B, C, and D. The neighboring 4 x 4 blocks in the current frame are labeled a, b, c, and d.
The motion derivation of sub-CU a begins by identifying its two spatial neighbors. The first neighbor is the nxn block above sub-CU a (block c). If this block c is not available or intra coded, other nxn blocks above sub-CU a are checked (from left to right, starting from block c). The second neighbor is the block to the left of sub-CU a (block b). If block b is not available or intra coded, other blocks on the left side of sub-CU a are checked (from top to bottom, starting from block b). The motion information obtained from neighboring blocks for each list is scaled to the first reference frame of the given list. Next, according to the same procedure as TMVP derivation specified in HEVC, temporal Motion Vector Prediction (TMVP) of sub-block a is derived. Motion information of the juxtaposed blocks at position D is extracted and corresponding scaling is performed. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) are averaged separately for each reference list. The average motion vector is designated as the motion vector of the current sub-CU.
FIG. 2 is an example of one CU with four sub-blocks (A-D) and neighboring blocks (a-D).
2.1.3 sub-CU motion prediction mode Signaling
The sub-CU mode is enabled as an additional Merge candidate and does not require additional syntax elements to signal the mode. Two additional Merge candidates are added to the Merge candidate list for each CU to represent ATMVP mode and STMVP mode. If the sequence parameter set indicates that ATMVP and STMVP are enabled, a maximum of seven Merge candidates are used. The coding logic of the additional Merge candidate is the same as that of the Merge candidate in the HM, which means that for each CU in the P-slice or B-slice, two more RD checks need to be made on the two additional Merge candidates.
In JEM, all blocks (bins) of the Merge index are context-coded by context-adaptive binary arithmetic coding CABAC. In HEVC, however, only the first block is context-coded and the remaining blocks are context-bypass-coded.
2.2 overlapped block motion Compensation
Overlapped Block Motion Compensation (OBMC) has been previously used in h.263. In JEM, different from in h.263, OBMC can be turned on and off using syntax at the CU level. When OBMC is used for JEM, OBMC is performed for all Motion Compensated (MC) block boundaries, except for the right and bottom boundaries of the CU. Furthermore, it is also applicable to luminance and chrominance components. In JEM, MC blocks correspond to codec blocks. When a CU is encoded with sub-CU modes (including sub-CU Merge, affine, frame rate up-conversion, and FRUC modes), each sub-block of the CU is an MC block. To process CU boundaries in a unified way, OBMC is performed at the sub-block level for all MC block boundaries, where the sub-block size is set equal to 4 x 4, as shown in fig. 3.
When OBMC is applied to a current sub-block, in addition to the current motion vector, the motion vectors of four consecutive neighboring sub-blocks (if available and different from the current motion vector) may also be used to derive the prediction block of the current sub-block. These multiple prediction blocks based on multiple motion vectors are combined to generate the final prediction signal for the current sub-block.
The prediction block based on the neighboring sub-block motion vector is denoted as P N Where N represents indexes of adjacent upper, lower, left, and right sub-blocks, and a prediction block based on a current sub-block motion vector is represented as P C . When P N When based on the motion information of the adjacent sub-block containing the same motion information as the current sub-block, the OBMC does not select from P N Performed at the same time. Otherwise, each P N Is added to P C In the same sample point of (i.e. P) N Is added to P C 。P N Using the weighting factors {1/4,1/8,1/16,1/32}, P C Weighting factors {3/4,7/8,15/16,31/32} are used. The exception is small MC blocks (i.e. the height or width of the codec block is equal to 4 or the CU is coded with sub-CU mode), for which at P C To which only P is added N Is a single row/column. In this case P N Using the weighting factors {1/4,1/8}, P C The weighting factors 3/4,7/8 are used. P for motion vector generation based on vertically (horizontally) adjacent sub-blocks N Will P N Samples in the same row (column) of (a) are added to P with the same weight factor C Is a kind of medium.
Fig. 3 is an example of a sub-block to which OBMC is applied.
In JEM, for CUs with a size less than or equal to 256 luma samples, a CU level flag may be signaled to indicate whether or not the current CU applies OBMC. For CUs with a size greater than 256 luma samples or without AMVP mode codec, OBMC is applied by default. At the encoder, when OBMC is applied to a CU, its effect is taken into account in the motion estimation stage. The prediction signal formed by the OBMC using the motion information of the upper and left neighboring blocks is used to compensate the upper and left boundaries of the original signal of the current CU, and then a conventional motion estimation process is applied.
2.3 adaptive motion vector difference resolution
In HEVC, a Motion Vector Difference (MVD) (between the motion vector of the PU and the predicted motion vector) is signaled in units of quarter luma samples when use_integer_mv_flag is equal to 0 in the slice header. In JEM, locally Adaptive Motion Vector Resolution (LAMVR) is introduced. In JEM, MVD may be encoded and decoded in units of quarter luminance samples, integer luminance samples, or four luminance samples. The MVD resolution is controlled at the coding and decoding unit (CU) level, and the MVD resolution flags conditionally signal each CU with at least one non-zero MVD component.
For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter-luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that quarter-luma sample MV precision is not used, another flag is signaled to indicate whether integer-luma sample MV precision or four-luma sample MV precision is used.
When the first MVD resolution flag of a CU is zero or not coded for the CU (meaning that all MVDs in the CU are zero), the CU uses quarter luma sample MV resolution. When one CU uses integer luminance sample MV precision or four luminance sample MV precision, the MVPs in the AMVP candidate list of that CU will be rounded to the corresponding precision.
In the encoder, a RD check at the CU level is used to determine which MVD resolution is to be used for the CU. That is, the RD check at the CU level is performed three times for each MVD resolution. In order to accelerate the encoder speed, the following encoding scheme is applied in JEM.
During RD checking of a CU with a conventional quarter-luminance sample MVD resolution, motion information of the current CU (integer luminance sample precision) is stored. When RD checking is performed on the same CU with integer luminance sample and 4 luminance sample MVD resolution, the stored motion information (rounded) is used as a starting point for further refinement of the small range motion vectors, so that the time-consuming motion estimation process is not repeated three times.
The RD check of the CU with 4 luma sample MVD resolution is conditionally invoked. For a CU, when the RD cost of the integer luminance sample MVD resolution is much greater than the RD cost of the quarter luminance sample MVD resolution, the RD check for the 4 luminance sample MVD resolution of the CU will be skipped.
The encoding process is shown in fig. 4. First, 1/4 pixel MVs are tested and RD costs are calculated and expressed as RDCost0, then integer MVs are tested and RD costs are expressed as RDCost1. If RDCost1< th RDCost0 (where th is a positive value), then 4 pixels MV are tested; otherwise, 4 pixels MV are skipped. Basically, when checking integer or 4-pixel MVs, motion information and RD costs of 1/4-pixel MVs are known, which can be reused to accelerate the encoding process of integer or 4-pixel MVs.
Fig. 4 is an example of a flow chart for encoding with different MV precision.
2.4 higher motion vector storage precision
In HEVC, motion vector precision is one-quarter pixel (one-quarter luma samples and one-eighth chroma samples of 4:2:0 video). In JEM, the accuracy of the internal motion vector store and the Merge candidate is increased to 1/16 pixel. Higher motion vector precision (1/16 pixels) is used for motion compensated inter prediction of CUs coded and decoded in skip/Merge mode. For a conventional AMVP mode codec CU, integer-pixel or quarter-pixel motion is used, as described in section 2.3.
An SHVC upsampling interpolation filter having the same filter length and normalization factor as the HEVC motion compensation interpolation filter is used as the motion compensation interpolation filter for the additional fractional pixel positions. The chrominance component motion vector accuracy in JEM is 1/32 samples, and an additional interpolation filter for the 1/32 pixel fractional position is derived by using the average of two filters adjacent to the 1/16 pixel fractional position.
2.5 affine motion compensated prediction
In HEVC, motion Compensated Prediction (MCP) applies only translational motion models. However, there may be various movements in the real world, such as zoom in/out, rotation, perspective movement, and other irregular movements. Simplified affine transformation motion compensated prediction is applied in JEM. As shown in fig. 5, the affine motion field of a block is described by two control point motion vectors.
Fig. 5 is an example of a simplified affine motion field.
The Motion Vector Field (MVF) of a block is described by the following equation:
wherein, (v) 0x ,v 0y ) Is the motion vector of the upper left corner control point, and (v 1x ,v 1y ) Is the motion vector of the upper right corner control point.
To further simplify motion compensated prediction, sub-block based affine transformation prediction is applied. The sub-block size M N is derived as in equation 2, where MvPre is the motion vector fractional precision (e.g., 1/16 in JEM). (v) 2x ,v 2y ) Is the motion vector of the lower left control point, which is calculated according to equation 1.
After deriving from equation 2, M and N should be adjusted down as divisors of w and h, respectively, if desired.
To derive the motion vector for each mxn sub-block, as shown in fig. 6, the motion vector for the center sample of each sub-block may be calculated according to equation 1 and rounded to a 1/16 fractional precision. A motion compensated interpolation filter is then applied, and predictions for each sub-block are generated using the derived motion vectors.
Fig. 6 is an example of affine MVF for each sub-block.
After MCP, the high precision motion vector for each sub-block is rounded and saved to the same precision as the normal motion vector.
In JEM, there are two affine motion modes: af_inter mode and af_merge mode. For CUs with width and height both greater than 8, the af_inter mode may be applied. In the bitstream, affine flags at the CU level are signaled to indicate whether af_inter mode is used. In this mode, adjacent blocks are used to construct a block with a motion vector pair { (v) 0 ,v 1 )|v 0 ={v A ,v B ,v c },v 1 ={v D ,v E Candidate list of }. As shown in fig. 8, v is selected from motion vectors of blocks A, B or C 0 . The motion vector from the neighboring block is scaled according to the reference list and the relationship between the POC referenced by the neighboring block, the POC referenced by the current CU, and the POC of the current CU. The method of selecting v1 from the neighboring blocks D and E is similar. When the number of candidate lists is less than 2, the list is populated by motion vector pairs that replicate each AMVP candidate. When the candidate list is greater than 2, the candidates may first be ordered according to the consistency of the neighboring motion vectors (similarity of the two motion vectors in a pair of candidates), and only the first two candidates remain. Determining selection using RD cost check Which motion vector pair candidate is the Control Point Motion Vector Prediction (CPMVP) of the current CU. And signaling an index indicating the location of the CPMVP in the candidate list in the bitstream. After the CPMVP of the current affine CU is determined, affine motion estimation is applied and a Control Point Motion Vector (CPMV) is found. Then, the difference of CPMV and CPMVP is signaled in the bitstream.
Fig. 7 is an example of a 4-parameter affine model (a) and a 6-parameter affine model (b).
Fig. 8 is an example of MVP of af_inter.
In the AF INTER mode, when a 4/6 parameter affine mode is used, 2/3 control points are required, and thus 2/3 MVDs need to be encoded and decoded for these control points, as shown in fig. 7. In JVET-K0337, it is proposed to derive MVs from mvd0 to predict mvd1 and mvd2.
At the encoder, the MVD of af_inter is iteratively derived. Assuming that this MVD derivation process is iterated n times, the final MVD is calculated as follows, where a i And b i Is an estimated affine parameter, and mvd [ k ]] h And mvd [ k ]] v Is mvd derived in the ith iteration k Horizontal and vertical components of (k=0, 1).
Using JVET-K0337, i.e. from mvd 0 Predicting mvd 1 Now actually for mvd 1 Encoding only
When a CU is applied in the af_merge mode, it obtains the first block encoded and decoded in affine mode from the valid neighboring reconstructed blocks. And the selection order of the candidate blocks is from left, upper right, lower left to upper left as shown in fig. 9. If the neighboring lower left block A is encoded in affine mode, as shown in FIG. 9, the motion vectors v of the upper left, upper right and lower left corners of the CU containing block A are derived 2 、v 3 And v 4 . And according to v 2 、v 3 And v 4 Calculating motion vector v of upper left corner of current CU 0 . Next, a motion vector v of the upper right of the current CU is calculated 1 。
CPMVv in deriving the current CU 0 And v 1 Thereafter, the MVF of the current CU is generated according to the reduced affine motion model equation 1. In order to identify whether the current CU is encoded in af_merge mode, an affine flag is signaled in the bitstream when at least one neighboring block is encoded in affine mode.
Fig. 9 is an example of candidates of af_merge.
2.6 intra block copy
Decoder aspect:
in this way [5], the current (partially) decoded picture is regarded as a reference picture. The current picture is placed in the last position of reference picture list 0. Thus, for a slice that uses the current picture as the only reference picture, its slice type is considered to be a P slice. The bit stream syntax in this approach follows the same syntax structure of inter-coding while the decoding process is unified with inter-coding. The only significant difference is that the block vector (the motion vector pointing to the current picture) always uses integer pixel resolution.
The change from block level cpr_flag mode is as follows:
in this mode of encoder search, both block width and height are less than or equal to 16.
Chroma interpolation is enabled when the luma block vector is an odd integer.
When the SPS flag is on, adaptive Motion Vector Resolution (AMVR) is enabled for CPR mode. In this case, when using AMVR, the block vector can be switched between 1-pixel integer and 4-pixel integer resolution at the block level.
Encoder aspect:
the encoder performs RD checking on blocks having a width or height of not more than 16. For the non-Merge mode, a block vector search is first performed using a hash-based search. If no valid candidates are found from the hash search, a local search based on block matching will be performed.
In a hash-based search, the hash key match (32-bit CRC) between the current block and the reference block is extended to all allowed block sizes. The hash key calculation for each position in the current picture is based on a 4x4 block. For a larger size current block, when all of its 4x4 blocks match the hash key in the corresponding reference location, a match of the hash key to the reference block will occur. If a plurality of reference blocks are found to match the current block having the same hash key, the block vector cost of each candidate block is calculated and the block with the smallest cost is selected.
In the block matching search, the search range is set to 64 pixels on the left and top of the current block.
Fig. 10 is an example of neighboring blocks of a current block.
3. Example of problems solved by the embodiments
Even when the current PU/CU is not encoded in sub-block mode, OBMC is always performed in sub-block level, which increases bandwidth and computational complexity. At the same time, a fixed 4x4 sub-block size is used, which also leads to bandwidth problems.
4. Examples of the embodiments
To address this problem, OBMC may be performed in larger block sizes or adaptive sub-block sizes. Meanwhile, in some of the proposed methods, motion compensation may be performed only once for one prediction direction.
The techniques listed below should be considered as examples explaining the general concepts. These techniques should not be interpreted narrowly. Furthermore, these inventions may be combined in any manner. It is proposed whether and how to apply a deblocking filter may depend on whether or not the relevant scalar quantization is used.
1. It is proposed that in OBMC processing, all sub-blocks within a current block use the same motion information associated with one representative neighboring block.
a. Alternatively, two representative neighboring blocks may be selected. For example, one representative block is selected from above the adjacent block, and the other representative block is selected from the left side of the adjacent block.
b. Alternatively, in addition, neighboring blocks may be located in different pictures in addition to spatial neighboring blocks.
c. In one example, this method is applied only when the current block does not use sub-block techniques (e.g., ATMVP, affine) codec.
2. When decoding a video unit (e.g., a block or sub-block), motion information derived from the bitstream may be further modified based on motion information of neighboring blocks, and the modified motion information may be used to derive a final predicted block of the video unit.
a. In one example, a representative neighboring block may be selected and its motion information may be used with the motion information of the current cell to derive modified motion information.
b. Alternatively, motion information of a plurality of representative neighboring blocks may be selected.
c. Further, in one example, each motion information selected may be first scaled to the same reference picture of the current video unit (e.g., for each prediction direction). The scaled MV (denoted as neighbor mvlx) and the MV of the current video unit (denoted as currMvLX) can then be used together to derive the final MV of the MC of the video unit (e.g. using weighted averaging).
i. When multiple sets of motion information are selected, the neighbor scalemvlx may be derived from multiple scaled motion vectors, e.g., using a weighted average or average of all scaled motion vectors.
in one example, the average MV, denoted avgMv, is calculated as: avgmv= (w1×nearest scalemxlx+w2×currmxlx+offset) > > N, where w1, w2, offset, and N are integers.
1. In one example, w1 and w2 are equal to 1 and 3, respectively, and N is 2 and offset is 2.
2. In one example, w1 and w2 are equal to 1 and 7, respectively, and N is 3 and offset is 4.
d. In one example, the proposed method is applied to the boundary region of the current block, e.g. the top few rows and/or the left few columns of the current block.
i. In one example, for the upper and/or left side bounding regions of a block, a neighbor scalemxvlx is generated with different representative neighboring blocks, and for the upper and left side bounding regions, two different neighbor scalemxvlx may be generated. For the upper left boundary region, either of two neighbor scalemxvlx can be used.
e. In one example, the proposed method is performed at the sub-block level. avgMv is derived for each sub-block and used for motion compensation of the sub-block.
f. In one example, the proposed method is performed at the sub-block level only when the current block is encoded in sub-block mode (e.g., ATMVP, STMVP, affine mode, etc.).
g. In one example, one of the neighboring blocks is selected, noted as a representative neighboring block, and its motion information can be used to derive the final MV. Alternatively, M adjacent blocks may be selected as representative adjacent blocks, for example, m=2, one adjacent block from an adjacent block above the current block, and one adjacent block from an adjacent block to the left of the current block.
h. In one example, for a boundary region of a block or a (e.g., 4x 4) small region of MxN within the boundary region, the proposed method may not be performed if its representative neighboring block is intra-coded.
i. In one example, for a boundary region of a block, if its representative neighboring block is intra-coded, multiple neighboring and/or non-neighboring blocks are examined until an inter-coded block is found, and if no inter-coded block is available, the method is disabled.
i. In one example, the non-neighboring blocks include upper or/and upper left or/and upper right neighboring blocks to the top boundary block of the CU, and the non-neighboring blocks include upper left or/and lower left neighboring blocks to the left boundary block of the CU, as shown in fig. 10.
in one example, the non-adjacent blocks include upper or/and upper left or/and upper right or/and left or/and upper left adjacent blocks.
in one example, non-neighboring blocks are examined in descending order of distance between the non-neighboring blocks and corresponding boundary blocks.
in one example, only some non-adjacent blocks are examined.
In one example, no more than K non-adjacent blocks are examined.
In one example, the width of the upper right and upper left regions is W/2 and the height of the lower left region is H/2, where W and H are the width and height of the CU.
j. In one example, for a boundary region of a block, if both its representative neighboring/non-neighboring block and the current block are bi-directionally or uni-directionally predicted from the same reference list, the method is performed in each valid prediction direction.
k. In one example, for a boundary region of a block, the method is only performed on list LX if its representative neighboring/non-neighboring block is uni-directionally predicted, e.g., predicted from list LX, and the current CU is bi-directionally predicted, or vice versa.
i. Alternatively, MV averaging is not performed.
In one example, for a boundary region of a block, the method is not performed if its representative neighboring/non-neighboring block and the current block are both uni-directionally predicted and predicted from different directions.
i. Alternatively, MVs of neighboring/non-neighboring blocks are scaled to a reference picture of the current block, and MV averaging is performed.
3. It is proposed that the motion information of one or more representative neighboring blocks can be used together to generate an additional prediction block (denoted neighbor predlx) of a video unit (block or sub-block). Assuming that the prediction block generated with currmxvlx is currPredLX, neighbor predlx and currPredLX can be used together to generate the final prediction block for the video unit.
a. In one example, the motion information of multiple representative neighboring blocks may be first scaled to the same reference picture of the current video unit (e.g., for each prediction direction), and then jointly using the scaled MVs (e.g., average/weighted average) to derive the neighbor scalemxlx. And neighbor predlx is generated based on neighbor scalemxlx.
b. In one example, the proposed method is applied to the boundary region of the current block, e.g. the top few rows and/or the left few columns of the current block.
i. In one example, for the upper and left boundary regions of a block, a neighbor scalemxlx is generated with different representative neighboring blocks, and for the upper and left boundary regions, two different neighbor scalemxlx may be generated. For the upper left boundary region, either of two neighbor scalemxvlx can be used.
c. In one example, the MV scaling process may be skipped.
d. In one example, the proposed method is performed at the sub-block level only when the current block is encoded in sub-block mode (e.g., ATMVP, STMVP, affine mode, etc.).
4. It is proposed that when OBMC and Local Illumination Compensation (LIC) work together, some temporary prediction blocks may be derived using the same LIC parameters for the current MV and the neighboring MVs, i.e. using the same LIC parameters and the current MV or the neighboring MVs.
a. In one example, for each prediction direction, the current MV may be used to derive LIC parameters for the current MV and neighboring MVs.
b. In one example, for each prediction direction, neighboring MVs may be used to derive LIC parameters for the current MV and neighboring MVs.
c. In one example, different LIC parameters may be derived for the current MV and the neighboring MVs, and different LIC parameters may be derived for the different neighboring MVs.
d. In one example, LIC may not be performed on nearby MVs.
e. In one example, LIC may not be performed on neighboring MVs only when the LIC flag of the corresponding neighboring block is false.
f. In one example, when the LIC flag of at least N (N > =0) neighboring blocks is false, LIC may not be performed on all neighboring MVs.
g. In one example, when the LIC flag of at least N (N > =0) neighboring blocks is true, LIC may be performed on all neighboring MVs.
h. In one example, if the LIC flag of the current block is false, no LIC may be performed for either the current MV or the neighboring MV.
i. In one example, if the LIC flag of the current block is false, the LIC may still be performed on the neighboring block if the LIC flag of the corresponding block is true.
5. It is proposed that when OBMC and GBI work together, the current MV and neighboring MVs may use the same GBI index.
a. In one example, the GBI index of the current block may be used for the current MV and neighboring MVs.
b. In one example, different GBI indices may be used for the current MV and the neighboring MVs, and different GBI indices may be used for the neighboring MVs. For example, for MVs (current MVs or neighboring MVs), GBI of the corresponding block is used.
c. In one example, a default GBI index may be used for all neighbor MVs.
i. For example, a GBI index indicating a [1,1] weight is referred to as a default GBI index.
6. It is proposed that in OBMC, DMVD may not be applied to adjacent MVs.
7. It is proposed that OBMC can be performed after DMVD when it works together with DMVD.
a. In one example, OBMC may be performed prior to DMVD to modify predicted samples, and then the modified predicted samples are used in DMVD.
i. In one example, the output of the DMVD can be used as the final prediction of the block.
b. In one example, only the modified prediction samples may be used to derive motion vectors in DMVD. After completion of the DMVD, the prediction block generated in the OBMC may be further used to modify the final prediction of the block.
8. It is proposed whether or not neighboring MVs are used in OBMC may depend on block size or motion information of neighboring blocks.
a. In one example, if the size of the neighboring block is 4x4 or/and 4x8 or/and 8x4 or/and 4x16 or/and 16x4, its neighboring MV may not be used in OBMC.
b. In one example, if the size of the neighboring block is 4x4 or/and 4x8 or/and 8x4 or/and 4x16 or/and 16x4 and is bi-predictive, its neighboring MVs may not be used in the OBMC.
9. It is proposed that blocks that are codec in combined intra and inter prediction (combined intra and inter prediction, CIIP) mode cannot use some neighboring pixels for intra prediction.
a. In one example, if a neighboring pixel of the CIIP mode block is from an intra-frame codec block, it may be deemed unavailable.
b. In one example, if a neighboring pixel of the CIIP mode block is from an inter codec block, it may be deemed unavailable.
c. In one example, if a neighboring pixel of the CIIP mode block is from the CIIP mode block, it may be deemed unavailable.
d. In one example, if a neighboring pixel of the CIIP mode block is from a CPR mode block, it may be deemed unavailable.
10. It is proposed that in early termination of BIO, sub-block (or block) level early termination may not be applicable.
a. Whether BIO is applied can only be decided at the sub-block level.
b. A block may be divided into sub-blocks and the decision to apply or not apply BIO may simply depend on the sub-block itself without reference to other sub-blocks.
11. It is suggested that for LIC encoded blocks, no OBMC is applied, regardless of the motion information of the block.
a. Alternatively, for uni-directionally predicted LIC codec blocks, OBMC may be further applied. Alternatively, in addition, how and/or when OBMC is applied may further depend on the codec information of the current block/neighboring block, such as block size, prediction direction.
12. The proposed OBMC or the sub-block size used in the proposed method may depend on the block size of the current block, the block shape, the motion information, the reference picture (assuming the current block size is w x h).
a. In one example, the sub-block size M1xM2 is used for blocks of w×h > =t, and the sub-block size N1xN2 is used for other blocks.
b. In one example, if w > =t, the width/height of the sub-block is set to M1; otherwise, the width/height of the sub-block is set to N1.
c. In one example, the sub-block size M1xM2 is used for unidirectional prediction blocks and the sub-block size N1xN2 is used for other blocks.
d. In one example, M1xM2 is 4x4.
e. In one example, M1xM2 is w/4x4 for the upper region and M1xM2 is 4xh/4 for the left region.
f. In one example, M1xM2 is w/4x2 for the upper region and M1xM2 is 4xh/2 for the left region.
g. In one example, N1xN2 is 8x8, 8x4, or 4x8.
h. In one example, N1xN2 is w/2x4 for the upper region and N1xN2 is 4xh/2 for the left region.
i. In one example, N1xN2 is w/2x2 for the upper region and N1xN2 is 2xh/2 for the left region.
13. The proposed method or OBMC may be applied to specific modes, block sizes/shapes and/or specific sub-block sizes.
a. The proposed method is applicable to specific modes such as traditional translational movements (i.e. affine mode is disabled).
b. The proposed method can be applied to specific block sizes.
i. In one example, it is applied only to blocks of w×h > =t, where w and h are the width and height of the current block, e.g., T is 16 or 32.
in another example, it is applied only to blocks where w > =t, e.g., T is 8.
Alternatively, it is applied only to blocks where w > =t1 =vj & h > =t2, e.g. T1 and T2 are equal to 8.
Alternatively, in addition, it is not applied to blocks w > =t1 and/or h > =t2. For example, T1 and T2 are equal to 128.
c. The use of the proposed method may be invoked under further conditions (e.g. based on block size/block shape/codec mode/stripe type/low latency check flag/temporal layer, etc.).
14. If a video unit (e.g., a block or sub-block) is encoded using Intra Block Copy (IBC) mode, OBMC may be applied to the video unit.
a. In one example, one or more representative neighboring blocks are invoked only when it is encoded in intra block copy mode. Alternatively, in addition, only motion information from such neighboring blocks is used in OBMC.
b. In one example, if one block is encoded with a sub-block technique (e.g., ATMVP) and some sub-blocks are encoded with IBC mode, then OBMC may still be applied to sub-blocks that are not IBC encoded. Alternatively, OBMC may be disabled for the entire block.
c. Alternatively, OBMC is disabled in intra block copy mode.
15. The proposed method is applicable to all color components. Alternatively, they may be applied only to certain color components. For example, they may be applied only to the luminance component.
16. Whether and how the proposed method is applied may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU/CU/CTU group/CU.
Fig. 11 is a block diagram of a video processing apparatus 1100. The apparatus 1100 may be used to implement one or more methods described herein. The apparatus 1100 may be implemented in a smart phone, tablet, computer, internet of things (IoT) receiver, or the like. The apparatus 1100 may include one or more processors 1102, one or more memories 1104, and video processing hardware 1106. The processor 1102 may be configured to implement one or more of the methods described herein. The memory(s) 1104 may be used to store data and code for implementing the methods and techniques described herein. Video processing hardware 1106 may be used to implement some of the techniques described herein in hardware circuitry.
Fig. 13 is a flow chart of a method 1300 for processing video. The method 1300 includes: determining (1305) that the first video block is adjacent to the second video block, determining (1310) motion information for the second video block, and performing (1315) further processing of sub-blocks of the first video block based on the motion information for the second video block.
Fig. 14 is a flow chart of a method 1400 for processing video. The method 1400 includes: -determining (1405) that the first video block is adjacent to the second video block, -determining (1410) motion information of the second video block, -modifying (1415) the motion information of the first video block based on the motion information of the second video block to generate modified motion information of the first video block, -determining (1420) a prediction block of the first video block based on the modified motion information, and-performing (1425) further processing of the first video block based on the prediction block.
Fig. 15 is a flow chart of a method 1500 for processing video. The method 1500 includes: determining (1505) that the first video block uses Intra Block Copy (IBC) mode codec, and processing (1510) the first video block using Overlapped Block Motion Compensation (OBMC) based on the determination that the first video block uses intra block copy mode codec.
Some examples of determining candidates for encoding and their use are described in section 4 herein with reference to methods 1300, 1400, and 1500. For example, as described in section 4, a sub-block of a first video block may be processed based on motion information of a second video block adjacent to the first video block.
Referring to methods 1300, 1400, and 1500, video blocks may be encoded in a video bitstream, wherein bit efficiency may be achieved by using bitstream generation rules related to motion information prediction.
The method may include: determining, by the processor, that the first video block is adjacent to the third video block; and determining, by the processor, motion information for a third video block, wherein further processing of sub-blocks of the first video block is performed based on the motion information for the third video block, one of the second video block or the third video block being located above the first video block and the other being located to the left of the first video block.
The method may include: wherein the first video block is from a first picture and the second video block is from a second picture, the first picture and the second picture being different pictures.
The method may include: wherein the first video block and the second video block are within the same picture.
The method may include: wherein the method is applied based on the first video block not encoded using the sub-block technique.
The method may include wherein the modified motion information is further based on motion information of a third video block adjacent to the first video block.
The method may include: wherein the motion information of the second video block and the motion information of the first video block are scaled based on a reference picture associated with the first video block, the modification being based on the scaled motion information.
The method may include: wherein the scaled motion information is based on a weighted average or average of scaled motion vectors from the scaled motion information.
The method may include: wherein an average of scaled motion vectors is calculated based on avgmv= (w1+w2.currmxvlx+offset) > > N, where w1, w2, offset, and N are integers, where avgMv is the average of scaled motion vectors, nearest scalable mxlx is the scaled motion vector, and currmxvlx is the motion vector of the first video block.
The method may include: where w1 is 1, w2 is 3, N is 2, and offset is 2.
The method may include: where w1 is 1, w2 is 7, N is 3, and offset is 4.
The method may include: wherein the method is applied to a border area of the first video block, the border area comprising a plurality of rows of blocks above the first video block, and the border area comprising a plurality of columns of blocks to the left of the first video block.
The method may include: wherein the neighbor scalemxvlx is based on a first neighbor scalemxvlx associated with a multi-row block above the first video in the boundary area and based on a second neighbor scalemxvlx associated with a multi-column block to the left of the first video block in the boundary area.
The method may include: wherein one or both of the first or second neighbor scalemxvlx is used for an upper left boundary region of the first video block.
The method may include: wherein the method is performed at a sub-block level and at a sub-block level avgMv is based on sub-blocks, wherein motion compensation of sub-blocks is based on avgMv.
The method may include: wherein the method is performed at a sub-block level based on the first video block encoded and decoded in the sub-block mode.
The method may include: a motion vector for the first video block is determined based on the motion information for the second video block.
The method may include: wherein the motion vector of the first video block is further based on motion information of a third video block, one of the second video block or the third video block being located above the first video block and the other being located to the left of the first video block.
The method may include: wherein if the second video block is intra-coded, the method is not performed.
The method may include: determining, by the processor, that the second video block is within the boundary region; determining, by the processor, that the second video block is intra-coded; determining, by the processor, that a third video block within the boundary region is inter-frame coded, the third video block being adjacent or non-adjacent to the first video block; and performing further processing of the first video block based on the third video block.
The method may include: wherein the bounding region includes one or more of the non-adjacent blocks that are top-left, or top-right bounding video blocks of the bounding region, and the non-adjacent blocks include one or more of the left-side, top-left, or bottom-left bounding blocks of the bounding region.
The method may include: wherein the non-adjacent blocks include one or more of the upper, upper left, upper right, left side, or upper left video blocks of the bounding region.
The method may include: wherein the non-adjacent blocks are examined in descending order to identify a third video block based on a distance between the non-adjacent blocks and the video blocks within the boundary region.
The method may include: wherein a subset of non-adjacent blocks is examined to identify a third video block.
The method may include: wherein the number of non-adjacent blocks checked to identify the third video block is less than or equal to a threshold K.
The method may include: wherein the width of the upper right and upper left regions is W/2 and the height of the lower left region is H/2, wherein W and H are the width and height of the first video block, which is a codec unit.
The method may include: the first video block and the third video block are determined to be bi-directionally predicted or uni-directionally predicted from a reference list, wherein the method is performed in each effective prediction direction.
The method may include: wherein the third video block is uni-directionally predicted and the first video block is bi-directionally predicted.
The method may include: wherein the third video block is bi-directionally predicted and the first video block is uni-directionally predicted.
The method may include: wherein no motion vector averaging is performed.
The method may include: the third video block and the first video block are determined to be unidirectionally predicted and predicted from different directions, wherein the method is not performed based on the determination.
The method may include: wherein the motion vector of the third video block is scaled to the reference picture of the first video block and motion vector averaging is performed.
The method may include: determining motion information for one or more neighboring blocks of the first video block; and determining a prediction block of the first video block based on the motion information of the one or more neighboring blocks.
The method may include: wherein the motion information of the one or more neighboring blocks is scaled to the reference picture of the first video block to generate scaled motion information comprising a scaled motion vector, the scaled motion vector being used to determine the prediction block.
The method may comprise: wherein the method is applied to a border region of the first video block.
The method may include: wherein the neighbor scalemxlx is based on a different neighboring block of the upper border area and the left border area of the first video block, and the two different neighbor scalemxlxs are based on the upper border area and the left border area, wherein either one of the two neighbor scalemxlxs is used for the upper left border area.
The method may include: wherein the motion vector scaling process is skipped.
The method may include: wherein the method is performed at a sub-block level based on the first block encoded and decoded in sub-block mode.
The method may include: wherein the sub-block size is based on a block size, a block shape, motion information, or a reference picture of the first video block.
The method may include: for blocks with width x and height greater than or equal to T, the sub-block size is M1xM2; for blocks where the width x height is not greater than or equal to T, the sub-block size is N1xN2.
The method may include: wherein the width divided by the height of the sub-block is M1 based on the width of the block being greater than or equal to T, and the width divided by the height of the sub-block is N1 if the width of the block is not greater than or equal to T.
The method may include: wherein for unidirectional prediction blocks the sub-block size is M1xM2 and for other blocks the sub-block size is N1xN2.
The method may include: wherein M1xM2 is 4x4.
The method may include: wherein M1xM2 is w/4x4 for the upper region and M1xM2 is 4xh/4 for the left region.
The method may include: wherein M1xM2 is w/4x2 for the upper region and M1xM2 is 4xh/2 for the left region.
The method may include: wherein N1xN2 is 8x8, 8x4 or 4x8.
The method may include: wherein N1xN2 is w/2x4 for the upper region and N1xN2 is 4xh/2 for the left region.
The method may include wherein N1xN2 is w/2x2 for the upper region and N1xN2 is 2xh/2 for the left region.
The method may include: wherein the method is applied to a conventional translational movement pattern.
The method may include: wherein the width x height of the first video block is greater than or equal to T, and T is 16 or 32.
The method may include: wherein the first video block has a width greater than or equal to T and a height greater than or equal to T, and T is 8.
The method may include: wherein the first video block has a width greater than or equal to T1 and a height greater than or equal to T2, and T1 and T2 are 8.
The method may include: wherein the method is not applied to video blocks having a width greater than or equal to T1 or a height greater than or equal to T2, T1 and T2 being 128.
The method may include: wherein the method is performed based on determining conditions according to one or more of: block size, block shape, codec mode, slice type, low latency check flag, or temporal layer.
The method may include: one or more neighboring blocks of the first video block are determined to use intra block copy mode codec, wherein the first video block is processed using OBMC based on the determination that the one or more neighboring blocks of the first video block were encoded with intra block copy mode codec.
The method may include: determining that the first video block includes a sub-block encoded and decoded with IBC mode; and processing the sub-blocks of the first video block not encoded with the IBC mode with the OBMC.
The method may include: determining that the first video block includes a sub-block encoded and decoded with IBC mode; and processing the sub-blocks without using OBMC based on the sub-blocks encoded and decoded with IBC mode.
The method may include: wherein the method is applied to one or more color components.
The method may comprise: wherein the method is applied to the luminance component.
The method may include: wherein the method is applied based on a signal from the encoder to the decoder, the signal being provided by a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a picture header, a slice group, a slice header, a Codec Tree Unit (CTU), a Codec Unit (CU), a CTU group or a CU group.
It should be appreciated that the disclosed techniques may be implemented in video encoders or decoders to improve compression efficiency when the compressed codec unit has a shape that is significantly different from a conventional square block or half square rectangular block. For example, new codec tools that use long or high codec units (such as 4x32 or 32x4 sized units) may benefit from the disclosed techniques.
Fig. 16 is a schematic diagram illustrating an example of a structure of a computer system or other control device 2600 that may be used to implement various portions of the disclosed technology. In fig. 16, computer system 2600 includes one or more processors 2605 and memory 2610 connected by an interconnect 2625. Interconnect 2625 may represent any one or more separate physical buses, point-to-point connections, or both connected by appropriate bridges, adapters, or controllers. Accordingly, the interconnect 2625 may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus, a HyperTransport or Industry Standard Architecture (ISA) bus, a Small Computer System Interface (SCSI) bus, a Universal Serial Bus (USB), an IIC (I2C) bus, or an Institute of Electrical and Electronics Engineers (IEEE) standard 674 bus (sometimes referred to as "FireWire").
The processor 2605 may include a Central Processing Unit (CPU) to control overall operation of the host, for example. In some embodiments, the processor 2605 achieves this by executing software or firmware stored in the memory 2610. The processor 2605 may be or include one or more programmable general or special purpose microprocessors, digital Signal Processors (DSPs), programmable controllers, application Specific Integrated Circuits (ASICs), programmable Logic Devices (PLDs), or the like, or a combination of such devices.
Memory 2610 may be or include the main memory of a computer system. Memory 2610 represents any suitable form of Random Access Memory (RAM), read Only Memory (ROM), flash memory, etc., or a combination of these devices. In use, memory 2610 may contain, among other things, a set of machine instructions that when executed by processor 2605 cause processor 2605 to perform operations to implement embodiments of the disclosed technology.
Also connected to the processor 2605 through an interconnect 2625 is a (optional) network adapter 2615. Network adapter 2615 provides computer system 2600 with the ability to communicate with remote devices, such as storage clients and/or other storage servers, and may be, for example, an ethernet adapter or a fibre channel adapter.
Fig. 17 illustrates a block diagram of an example embodiment of a device 2700 that may be used to implement various portions of the disclosed technology. The mobile device 2700 may be a notebook computer, smart phone, tablet computer, video camera, or other device capable of processing video. The mobile device 2700 includes a processor or controller 2701 to process data and a memory 2702 in communication with the processor 2701 to store and/or buffer data. For example, the processor 2701 may include a Central Processing Unit (CPU) or a microcontroller unit (MCU). In some implementations, the processor 2701 may include a Field Programmable Gate Array (FPGA). In some implementations, the mobile device 2700 includes or communicates with a Graphics Processing Unit (GPU), a Video Processing Unit (VPU), and/or a wireless communication unit to implement various visual and/or communication data processing functions of the smartphone device. For example, memory 2702 may include and store processor-executable code that, when executed by processor 2701, configures mobile device 2700 to perform various operations, such as receiving information, commands, and/or data, processing information and data, and transmitting or providing processed information/data to another data device, such as an executor or an external display. To support various functions of the mobile device 2700, the memory 2702 can store information and data, such as instructions, software, values, images, and other data processed or referenced by the processor 2701. For example, the storage functions of memory 2702 may be implemented using various types of Random Access Memory (RAM) devices, read Only Memory (ROM) devices, flash memory devices, and other suitable storage media. In some implementations, the mobile device 2700 includes an input/output (I/O) unit 2703 to interface the processor 2701 and/or the memory 2702 with other modules, units, or devices. For example, I/O unit 2703 may interface with processor 2701 and memory 2702 to utilize various wireless interfaces compatible with typical data communication standards, e.g., between one or more computers and user devices in the cloud. In some implementations, the mobile device 2700 may interface with other devices using a wired connection through the I/O unit 2703. I/O unit 2703 may include a wireless sensor, such as an infrared detector for detecting remote control signals, or other suitable wireless human-machine interface technology. The mobile device 2700 may also be connected to other external interfaces (e.g., data storage) and/or visual or audio display devices 2704 to retrieve and transmit data and information that may be processed by the processor, stored by the memory, or displayed on the display device 2704 or an output unit of the external device. For example, the display device 2704 may display video frames modified based on MVP in accordance with the disclosed techniques.
Fig. 18 is a block diagram illustrating an example video processing system 1900 in which various techniques disclosed herein may be implemented. Various implementations may include some or all of the components of system 1800. The system 1800 may include an input 1802 for receiving video content. The video content may be received in an original or uncompressed format, such as 8-bit or 10-bit multi-component pixel values, or may be received in a compressed or encoded format. Input 1802 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interfaces include wired interfaces such as ethernet, passive Optical Network (PON), etc., and wireless interfaces such as Wi-Fi or cellular interfaces.
The system 1800 may include a codec component 1804 that may implement various encodings or encoding methods described herein. The codec component 1804 may reduce the average bit rate of the video from the input 1802 to the output of the codec component 1804 to produce a codec representation of the video. Thus, codec technology is sometimes referred to as video compression or video transcoding technology. The output of the codec component 1804 may be stored or may be transmitted via a connected communication, as shown by component 1806. The stored or communication bitstream (or codec) representation of the video received at input 1802 may be used by component 1808 to generate pixel values or displayable video sent to display interface 1810. The process of generating user viewable video from a bitstream representation is sometimes referred to as video decompression. Further, while certain video processing operations are referred to as "encoding" operations or tools, it should be understood that a codec tool or operation is used at the encoder and that a corresponding decoding tool or operation of the reverse encoding result will be performed by the decoder.
Examples of the peripheral bus interface or the display interface may include a Universal Serial Bus (USB) or a High Definition Multimedia Interface (HDMI) or a display port, etc. Examples of storage interfaces include SATA (serial advanced technology attachment), PCI, IDE interfaces, and the like. The techniques described herein may be implemented in various electronic devices such as mobile phones, notebook computers, smartphones, or other devices capable of performing digital data processing and/or video display.
Fig. 19 is a flow chart of an example method of video processing. The method 1900 includes: during a transition between a current block of visual media data and a corresponding codec representation of the visual media data, determining (at step 1902) at least one neighboring block of the current block; determining (at step 1904) motion information of at least one neighboring block; and performing (at step 1906) Overlapped Block Motion Compensation (OBMC) on the current block based on the motion information of the at least one neighboring block; wherein the OBMC tool comprises: the final predictor of a sub-block is generated using the intermediate predictor of one sub-block of the current block and the predictor of at least one neighboring sub-block.
In some implementations, additional modifications may be performed to method 1900. For example, performing OBMC on the current block based on motion information of at least one neighboring block includes: OBMC is performed on all sub-blocks of the current block based on motion information of at least one neighboring block. The at least one neighboring block includes a first neighboring block located above the current block and a second neighboring block located at the left side of the current block. At least one neighboring block and the current block are from different pictures of the visual media data. The method is applied only when the current block is not encoded using the sub-block technique.
Fig. 20 is a flow chart of an example method of video processing. The method 2000 comprises the following steps: during a transition between a current block of visual media data and a corresponding codec representation of the visual media data, determining (at step 2002) at least one neighboring block of the current block; determining (at step 2004) motion information for at least one neighboring block; and modifying (at step 2006) the motion information of the current block based on the motion information of the at least one neighboring block to generate modified motion information of the current block; processing of the current block is performed (at step 2008) based on the modified motion information.
In some implementations, additional modifications may be performed to method 2000. For example, modifying motion information of the current block based on motion information of at least one neighboring block to generate modified motion information of the current block includes: the motion information of the current block is modified based on the motion information of the at least one neighboring block and the motion information of the current block to generate modified motion information of the current block. Modifying the motion information of the current block includes: scaling the motion information of at least one neighboring block to the same reference picture of the current block; and modifying the motion information of the current block based on the scaled motion information of the at least one neighboring block and the motion information of the current block. The scaled motion information of at least one neighboring block is weighted averaged or averaged to generate one representative scaled motion vector for each reference picture list of the current block. The modified motion information of the current block is generated as a weighted average of the representative scaled motion vector and the motion vector of the current block. The modified motion vector is calculated as: avgmv= (w1+w2×currmvlx+offset) > > N, where w1, w2, offset, and N are integers, where avgMv is a modified motion vector, neighbor scalemvlx is a representative scaled motion vector, and currMvLX is a motion vector of the current block, and X is a reference picture list, where x=0, 1. w1 is 1, w2 is 3, N is 2, and offset is 2, or where w1 is 1, w2 is 7, N is 3, and offset is 4. Performing processing of the current block based on the motion information of the at least one neighboring block includes: processing is performed on a bounding region of the current block, wherein the bounding region of the current block includes a plurality of top rows and/or left columns of the current block. Representative motion vectors are generated for the top row of the current block and the left column of the current block using different neighboring blocks, respectively. The method is applied at the sub-block level only when the current block is encoded using the sub-block technique. When at least one neighboring block of the boundary region is an intra-frame codec, the method is not performed on the boundary region of the current block. When at least one neighboring block is an intra-frame codec, the method further includes: the neighboring blocks and/or non-neighboring blocks are checked until one inter-codec block is found, and the motion vector modification process is disabled in response to the inter-codec block not being found. The non-neighboring blocks include upper and/or upper left and/or upper right neighboring blocks of the top border region of the current block, and the non-neighboring blocks include upper left and/or lower left neighboring blocks of the left border region of the current block. Non-adjacent blocks include upper and/or upper left and/or upper right and/or upper left adjacent blocks. The non-adjacent blocks are checked in descending order of distance between the non-adjacent blocks and the corresponding blocks in the boundary region. A subset of non-neighboring blocks or a number of non-neighboring blocks is checked, the number not being greater than a threshold K. The width of the upper right and left regions is W/2, and the height of the lower left region is H/2, where W and H are the width and height of the current block as a codec unit. The method is performed in each active prediction direction when at least one neighboring block/non-neighboring block and the current block are bi-directionally predicted or uni-directionally predicted from the reference list. Modified motion information is generated for the first list when at least one neighboring block/non-neighboring block is uni-directionally predicted from the first list and the current block is bi-directionally predicted, or when at least one neighboring block/non-neighboring block is bi-directionally predicted and the current block is uni-directionally predicted from the first list. No modified motion information is generated. When at least one neighboring block/non-neighboring block and the current block are unidirectional predicted and predicted from different directions, no modified motion information is generated. When at least one neighboring block/non-neighboring block and the current block are unidirectional predicted and predicted from different directions, a motion vector of the neighboring block/non-neighboring block is scaled to a reference picture of the current block and modified motion information is generated.
Fig. 21 is a flow chart of an example method of video processing. The method 2100 includes: during a transition between a current block of visual media data and a corresponding codec representation of the visual media data, determining (at step 2102) a plurality of neighboring blocks of the current block; determining (at step 2104) motion information for a plurality of neighboring blocks; determining (at step 2106) a first prediction block of the current block based on motion information of the current block; determining (at step 2108) a second predicted block for the current block based on motion information of the plurality of neighboring blocks; modifying (at step 2110) the first prediction block based on the second prediction block; and processing of the current block is performed (at step 2112) based on the first prediction block.
In some implementations, additional modifications may be performed to method 2100. For example, motion information of one of the plurality of neighboring blocks is scaled to a reference picture of the current block to generate representative scaled motion information for determining a second prediction block of the current block. Modifying the first prediction block further includes: the modified prediction block is generated as a weighted average of the first prediction block and the second prediction block. Performing processing of the current block based on the first prediction block includes: processing is performed on a boundary region of a current block, wherein the boundary region of the current block includes an upper boundary region of the current block having a plurality of top rows and/or a left boundary region having a left column. Two different representative scaled motion vectors are generated for the upper and left bounding regions based on different neighboring blocks. Either of two different scaled motion vectors is used for the upper left bounding region. The motion vector scaling process is skipped. Modifying the first prediction block based on the second prediction block includes: processing is performed on one or more sub-blocks of the current block based on the motion information of at least one neighboring block. This method is applied only when the current block is encoded using the sub-block technique. The sub-block technique includes: advanced Temporal Motion Vector Prediction (ATMVP), spatial motion vector prediction (STMVP), affine mode including affine inter mode and affine Merge mode.
Fig. 22 is a flow chart of an example method of video processing. The method 2200 includes: during a transition between the current block and the bitstream representation of the current block, determining (at step 2202) a motion vector for a first sub-block within the current block; performing conversion using (at step 2204) an Overlapped Block Motion Compensation (OBMC) mode; wherein the OBMC mode generates a final prediction value of the first sub-block using an intermediate prediction value of the first sub-block based on the motion vector of the first sub-block and a prediction value of at least a second video unit adjacent to the first sub-block; wherein the sub-block size of the first sub-block is based on the block size, the block shape, the motion information, or the reference picture of the current block.
Fig. 23 is a flow chart of an example method of video processing. The method 2300 includes: during a transition between a current block in video data and a bitstream representation of the current block, generating (at step 2302) at least one sub-block from the current block based on a dimension of the current block; generating (at step 2304) different predictions for the at least one sub-block based on the different prediction lists; applying (at step 2306) early termination processing at the sub-block level to determine whether to apply a bidirectional optical flow (BDOF) processing tool to at least one sub-block; and performing (at step 2308) a conversion based on the application; wherein the BDOF processing tool generates a prediction offset based on at least one of the different predicted horizontal or vertical gradients.
Fig. 24 is a flow chart of an example method of video processing. The method 2400 includes: generating (at step 2402) a current motion vector for a current block during a transition between the current block and a bitstream representation of the current block in the video data; generating (at step 2404) one or more neighboring motion vectors for one or more neighboring blocks of the current block; deriving (at step 2406) a first type prediction for the current block based on the current motion vector; deriving (at step 2408) one or more second type predictions for the current block based on the one or more neighboring motion vectors; determining (at step 2410) whether to apply Local Illumination Compensation (LIC) to the first type of prediction or the second type of prediction based on characteristics of the current block or the neighboring block; and performing (at step 2412) a conversion based on the determination; wherein the LIC constructs a linear model with a plurality of parameters to refine the prediction based on the prediction direction.
Fig. 25 is a flow chart of an example method of video processing. The method 2500 includes: generating (at step 2502) a current motion vector for a current block during a transition between the current block and a bitstream representation of the current block in the video data; generating (at step 2504) one or more neighboring motion vectors for one or more neighboring blocks of the current block; deriving (at step 2506) a first type prediction for the current block based on the current motion vector; deriving (at step 2508) one or more second type predictions for the current block based on the one or more neighboring motion vectors; applying (at step 2510) generalized bi-directional prediction (GBi) to either the first type of prediction or the second type of prediction; and performing (at step 2512) a transition based on the determination; wherein the GBi comprises applying equal or unequal weights to different prediction directions of the first and second type of predictions based on GBi indexes of the weight list.
Fig. 26 is a flow chart of an example method of video processing. The method 2600 comprises: during a transition between the current video block and the bitstream representation of the current video block, determining (at step 2602) one or more prediction blocks for the current video block; and performing (at step 2604) a conversion based on the one or more prediction blocks at least by using Overlapped Block Motion Compensation (OBMC) and decoder-side motion vector derivation (DMVD), wherein DMVD applies refinement to the motion vectors based on a sum of absolute differences between different prediction directions, or applies refinement to the predictions based on at least one of horizontal or vertical gradients of the different predictions. Wherein the OBMC derives a refined prediction based on a current motion vector of the current video block and one or more neighboring motion vectors of neighboring blocks.
Fig. 27 is a flow chart of an example method of video processing. The method 2700 includes: during a transition between a current block in the video data and a bitstream representation of the current video block, determining (at step 2702) availability of at least one neighboring sample of the current video block; generating (at step 2704) an intra prediction of the current video block based on availability of at least one neighboring sample; generating (at step 2706) an inter prediction of the current block based on at least one motion vector; deriving (at step 2708) a final prediction for the current block based on a weighted sum of the intra prediction and the inter prediction; and performing (at step 2710) a transition based on the final prediction.
In some implementations, additional modifications may be performed to method 2200. For example, the conversion generates the current block from the bitstream representation. The conversion generates a bitstream representation from the current block. The current block has a width w and a height h, and if w×h is greater than or equal to a first threshold T1, the size of the first sub-block is M1x M2; and if w×h is less than the first threshold T1, the size of the sub-block is N1x N2, where M1, M2, w, h, N1, N2, and T1 are integers. The current block has a width w and a height h, and if w is greater than or equal to a second threshold T2, a width-to-height ratio w/h of a first sub-block of the current block is M1; and if w is less than the second threshold T2, the width to height ratio w/h of the first sub-block is N1, where M1, N1 and T2 are integers. If the current block is a unidirectional prediction block, the size M1xM2 of the first sub-block is used, otherwise the size N1xN2 of the first sub-block is used. M1xM2 is 4x4. For the upper region, M1xM2 is (w/4) x4, and for the left region, M1xM2 is 4x (h/4). For the upper region, M1xM2 is (w/4) x2, and for the left region, M1xM2 is 4x (h/2). N1xN2 is 8x8, 8x4 or 4x8. For the upper region, N1xN2 is (w/2) x4, and for the left region, N1xN2 is (h/2). For the upper region, N1xN2 is (w/2) x2, and for the left region, N1xN2 is 2x (h/2). The method is disabled in affine mode. The method is applied to translational motion modes. If the product w×h of the width and height of the current block is greater than or equal to a third threshold T3, the method is applied to the current block, where T3 is an integer. T3 is 16 or 32. If the width w of the current block is greater than or equal to the fourth threshold T4 and the height h is greater than or equal to the fourth threshold T4, the method is applied to the current block, where T4 is an integer. T is 8. If the width w of the current block is greater than or equal to the fifth threshold T5 and the height h is greater than or equal to the sixth threshold T6, the method is applied to the current block, where T5 and T6 are integers. T5 and T6 are integer multiples of 8, T5 being the same or different from T6. If the width w of the current block is greater than or equal to the seventh threshold T7, or the height h is greater than or equal to the eighth threshold T8, the method is not applied to the current block, where T7 and T8 are integers. T7 and T8 are 128. The current block is encoded using an Intra Block Copy (IBC) mode, wherein the IBC mode uses a picture of the current block as a reference picture. The second video unit uses IBC mode codec. The method is applied to all color components. The method is applied to one or more color components. This method is applied only to the luminance component. Whether and how to apply the method is signaled from the encoder to the decoder in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a picture header, a slice group, a slice header, a Codec Tree Unit (CTU), a Codec Unit (CU), a CTU group, or a CU group.
Some features that are preferably implemented by some embodiments are now disclosed in terms-based format.
1. A video processing method, comprising:
determining at least one neighboring block of the current block of visual media data during a transition between the current block and a corresponding codec representation of the visual media data;
determining motion information of at least one neighboring block; and
performing Overlapped Block Motion Compensation (OBMC) on the current block based on the motion information of at least one neighboring block,
wherein, OBMC includes: the final predictor of a sub-block is generated using the intermediate predictor of one sub-block of the current block and the predictor of at least one neighboring sub-block.
2. The method of clause 1, wherein performing OBMC on the current block based on the motion information of at least one neighboring block comprises:
OBMC is performed on all sub-blocks of the current block based on motion information of at least one neighboring block.
3. The method of clause 1 or 2, wherein the at least one neighboring block comprises a first neighboring block located above the current block and a second neighboring block located to the left of the current block.
4. The method of any of clauses 1-3, wherein at least one neighboring block and the current block are from different pictures of the visual media data.
5. The method according to any of clauses 1 to 4, wherein the method is applied only when the current block is not encoded using the sub-block technique.
6. A video processing method, comprising:
determining at least one neighboring block of the current block of visual media data during a transition between the current block and a corresponding codec representation of the visual media data;
determining motion information of at least one neighboring block; and
modifying motion information of the current block based on the motion information of the at least one neighboring block to generate modified motion information of the current block;
the processing of the current block is performed based on the modified motion information.
7. The method of clause 6, wherein modifying the motion information of the current block based on the motion information of at least one neighboring block to generate modified motion information of the current block comprises:
the motion information of the current block is modified based on the motion information of the at least one neighboring block and the motion information of the current block to generate modified motion information of the current block.
8. The method of clause 6 or 7, wherein modifying the motion information of the current block comprises:
scaling the motion information of at least one neighboring block to the same reference picture of the current block; and modifying the motion information of the current block based on the scaled motion information of the at least one neighboring block and the motion information of the current block.
9. The method of clause 8, wherein the scaled motion information of at least one neighboring block is weighted averaged or averaged to generate one representative scaled motion vector for each reference picture list of the current block.
10. The method of clause 9, wherein the modified motion information of the current block is generated as a weighted average of the representative scaled motion vector and the motion vector of the current block.
11. The method of clause 10, wherein the modified motion vector is calculated as: avgmv= (w1×neighbor scalemxlx+w2×currmxlx+offset) > > N,
where w1, w2, offset, and N are integers, where avgMv is a modified motion vector, neighbor scalemxvlx is a representative scaled motion vector, and currMvLX is a motion vector of the current block, X is a reference picture list, where x=0, 1.
12. The method of clause 11, wherein w1 is 1, w2 is 3, N is 2, and offset is 2, or wherein w1 is 1, w2 is 7, N is 3, and offset is 4.
13. The method of any of clauses 6 to 12, wherein performing the processing of the current block based on the motion information of at least one neighboring block comprises:
processing is performed on the boundary region of the current block,
Wherein the bounding region of the current block comprises a plurality of top rows and/or left columns of the current block.
14. The method of clause 13, wherein the representative motion vectors are generated for the top row of the current block and the left column of the current block using different neighboring blocks, respectively.
15. The method according to any of clauses 6 to 14, wherein the method is applied at the sub-block level only when the current block is encoded using sub-block technology.
16. The method of any of clauses 6 to 15, wherein the method is not performed on the boundary region of the current block when at least one neighboring block of the boundary region is an intra-frame codec.
17. The method of any of clauses 6 to 16, wherein when at least one neighboring block is an intra-frame codec, the method further comprises:
checking neighboring blocks and/or non-neighboring blocks until an inter-frame codec block is found, and
the motion vector modification process is disabled in response to the inter-codec block not being found.
18. The method of clause 17, wherein the non-neighboring blocks comprise upper and/or upper left and/or upper right neighboring blocks of the top bounding region of the current block and the non-neighboring blocks comprise upper left and/or lower left neighboring blocks of the left bounding region of the current block.
19. The method of clause 17, wherein the non-adjacent blocks comprise upper and/or upper left and/or upper right and/or left and/or upper left adjacent blocks.
20. The method of clause 17, wherein the non-adjacent blocks are examined in descending order of distance between the non-adjacent blocks and corresponding blocks within the boundary region.
21. The method of any of clauses 17-20, wherein a subset of non-adjacent blocks or a number of non-adjacent blocks is examined, the number not being greater than a threshold K.
22. The method of any of clauses 17-21, wherein the width of the upper right and upper left regions is W/2 and the height of the lower left region is H/2, wherein W and H are the width and height of the current block as a codec unit.
23. The method of any of clauses 17-22, wherein the method is performed in each active prediction direction when at least one neighboring block/non-neighboring block and the current block are bi-directionally predicted or uni-directionally predicted from a reference list.
24. The method of any of clauses 17-22, wherein the modified motion information is generated for the first list when at least one neighboring block/non-neighboring block is uni-directionally predicted from the first list and the current block is bi-directionally predicted, or when at least one neighboring block/non-neighboring block is bi-directionally predicted and the current block is uni-directionally predicted from the first list.
25. The method of clause 24, wherein the modified athletic information is not generated.
26. The method of any of clauses 17-22, wherein the modified motion information is not generated when the at least one neighboring block/non-neighboring block and the current block are unidirectional predicted and predicted from different directions.
27. The method of any of clauses 17-22, wherein when at least one neighboring block/non-neighboring block and the current block are uni-directionally predicted and predicted from different directions, the motion vector of the neighboring block/non-neighboring block is scaled to a reference picture of the current block and modified motion information is generated.
28. A video processing method, comprising:
determining a plurality of neighboring blocks of the current block of visual media data during a transition between the current block of visual media data and a corresponding codec representation of the visual media data;
determining motion information of a plurality of adjacent blocks;
determining a first prediction block of the current block based on motion information of the current block;
determining a second prediction block of the current block based on motion information of the plurality of neighboring blocks;
modifying the first prediction block based on the second prediction block; and
processing of the current block is performed based on the first prediction block.
29. The method of clause 28, wherein the motion information of one of the plurality of neighboring blocks is scaled to the reference picture of the current block to generate representative scaled motion information that is used to determine a second predicted block for the current block.
30. The method of clause 29, wherein modifying the first prediction block further comprises:
the modified prediction block is generated as a weighted average of the first prediction block and the second prediction block.
31. The method of clause 30, wherein performing the processing of the current block based on the first prediction block comprises:
processing is performed on the boundary region of the current block,
wherein the boundary region of the current block comprises an upper boundary region of the current block having a plurality of top rows and/or a left boundary region having a left column.
32. The method of clause 31, wherein two different representative scaled motion vectors are generated for the upper and left bounding regions based on different neighboring blocks.
33. The method of clause 32, wherein any one of two different scaled motion vectors is used for the upper left bounding region.
34. The method of any of clauses 28 to 33, wherein the motion vector scaling process is skipped.
35. The method of any of clauses 28-33, wherein modifying the first prediction block based on the second prediction block comprises:
processing is performed on one or more sub-blocks of the current block based on the motion information of at least one neighboring block.
36. The method of any of clauses 28 to 35, wherein the method is applied only when the current block is encoded using a sub-block technique.
37. The method of any of clauses 1-36, wherein the sub-block technique comprises: advanced Temporal Motion Vector Prediction (ATMVP), spatial motion vector prediction (STMVP), affine mode including affine inter mode and affine Merge mode.
38. A video processing method, comprising:
determining a motion vector of a first sub-block within the current block during a transition between the current block and a bitstream representation of the current block;
performing conversion using an Overlapped Block Motion Compensation (OBMC) mode;
wherein the OBMC mode generates a final prediction value of the first sub-block using an intermediate prediction value of the first sub-block, and a prediction value of at least a second video unit adjacent to the first sub-block, wherein the intermediate prediction value is based on the motion vector of the first sub-block;
wherein the sub-block size of the first sub-block is based on the block size, the block shape, the motion information, or the reference picture of the current block.
39. The method of clause 38, wherein converting generates the current block from the bitstream representation.
40. The method of clause 38, wherein converting generates a bitstream representation from the current block.
41. The method of any of clauses 38-40, wherein the current block has a width w and a height h, and the first sub-block has a size M1 x M2 if w x h is greater than or equal to a first threshold T1; and if w×h is less than the first threshold T1, the size of the sub-block is N1 x N2, where M1, M2, w, h, N1, N2, and T1 are integers.
42. The method of any of clauses 38-40, wherein the current block has a width w and a height h, and the width to height ratio w/h of the first sub-block of the current block is M1 if w is greater than or equal to the second threshold T2; and if w is less than the second threshold T2, the width to height ratio w/h of the first sub-block is N1, where M1, N1 and T2 are integers.
43. The method of clause 41, wherein if the current block is a unidirectional prediction block, the size M1xM2 of the first sub-block is used, otherwise the size N1xN2 of the first sub-block is used.
44. The method of clause 41 or 43, wherein M1xM2 is 4x4.
45. The method of clause 41 or 43, wherein for the upper region, M1xM2 is (w/4) x4, and for the left region, M1xM2 is 4x (h/4).
46. The method of clause 41 or 43, wherein for the upper region, M1xM2 is (w/4) x2, and for the left region, M1xM2 is 4x (h/2).
47. The method of clause 41 or 43, wherein N1xN2 is 8x8, 8x4, or 4x8.
48. The method of clause 41 or 43, wherein for the upper region, N1xN2 is (w/2) x4, and for the left region, N1xN2 is (h/2).
49. The method of clause 41 or 43, wherein for the upper region, N1xN2 is (w/2) x2, and for the left region, N1xN2 is 2x (h/2).
50. The method of any one of clauses 38 to 49, wherein the method is disabled in affine mode.
51. The method of any of clauses 38 to 50, wherein the method is applied to translational motion patterns.
52. The method of any one of clauses 38 to 51, wherein if the product w x h of the width and height of the current block is greater than or equal to a third threshold T3, the method is applied to the current block, where T3 is an integer.
53. The method of clause 52, wherein T3 is 16 or 32.
54. The method of any one of clauses 38 to 51, wherein if the width w of the current block is greater than or equal to the fourth threshold T4 and the height h is greater than or equal to the fourth threshold T4, the method is applied to the current block, wherein T4 is an integer.
55. The method of clause 54, wherein T is 8.
56. The method of any one of clauses 38 to 51, wherein if the width w of the current block is greater than or equal to a fifth threshold T5 and the height h is greater than or equal to a sixth threshold T6, the method is applied to the current block, wherein T5 and T6 are integers.
57. The method of clause 56, wherein T5 and T6 are integer multiples of 8, and T5 is the same or different from T6.
58. The method of any of clauses 38-51, wherein the method is not applied to the current block if the width w of the current block is greater than or equal to a seventh threshold T7, or the height h is greater than or equal to an eighth threshold T8, where T7 and T8 are integers.
59. The method of clause 58, wherein T7 and T8 are 128.
60. The method of clause 38, wherein the current block is encoded using an Intra Block Copy (IBC) mode, wherein the IBC mode uses a picture of the current block as a reference picture.
61. The method of clause 60, wherein the second video unit uses IBC mode codec.
62. The method of any one of clauses 38 to 61, wherein the method is applied to all color components.
63. The method of any one of clauses 38 to 61, wherein the method is applied to one or more color components.
64. The method of clause 63, wherein the method is applied only to the luminance component.
65. The method of any of clauses 1 to 64, wherein whether and how the method is applied is signaled from the encoder to the decoder in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a picture header, a slice group, a slice header, a Codec Tree Unit (CTU), a Codec Unit (CU), a CTU group or a CU group.
66. A video decoding apparatus comprising a processor configured to implement the method of any one or more of clauses 1-65.
67. A video encoding apparatus comprising a processor configured to implement the method of any one or more of clauses 1-65.
68. A computer program product having computer code stored thereon, which when executed by a processor causes the processor to implement the method of any of clauses 1 to 65.
Some features that are preferably implemented by some embodiments are now disclosed in another clause-based format.
1. A video processing method, comprising:
generating at least one sub-block from a current block in video data based on a dimension of the current block during a transition between the current block and a bitstream representation of the current block;
generating different predictions of the at least one sub-block based on different prediction lists;
applying an early termination process at a sub-block level to determine whether to apply a bidirectional optical flow (BDOF) processing tool to the at least one sub-block; and
performing the conversion based on the application;
wherein the BDOF processing tool generates a prediction offset based on at least one of the differently predicted horizontal or vertical gradients.
2. The method of clause 1, wherein the early termination processing at the sub-block level is based on a Sum of Absolute Differences (SAD) between different predictions of the at least one sub-block.
3. The method of clause 2, wherein the SAD is calculated based on the differently predicted partial positions.
4. The method of clause 2 or 3, wherein the BDOF processing tool is not applied based on the SAD being less than a threshold.
5. The method of clause 4, wherein the threshold is based on the dimension of the at least one sub-block.
6. The method of clause 1, wherein the BDOF processing tool applies the prediction offset to refine the different predictions to derive a modified prediction for the at least one sub-block.
7. The method of clause 1, further comprising:
the current block is divided into a plurality of sub-blocks and SAD is generated for each sub-block.
8. The method of clause 7, wherein determining whether to apply the BDOF processing tool to each sub-block is based on its own SAD without reference to the SADs of other sub-blocks.
9. A video processing method, comprising:
generating a current motion vector for a current block in video data during a transition between the current block and a bitstream representation of the current block;
Generating one or more neighboring motion vectors of one or more neighboring blocks of the current block;
deriving a first type prediction of the current block based on the current motion vector;
deriving one or more second type predictions for the current block based on the one or more neighboring motion vectors;
determining whether to apply Local Illumination Compensation (LIC) to the first type prediction or the second type prediction based on characteristics of the current block or the neighboring block; and
performing the conversion based on the determination;
wherein the LIC constructs a linear model with a plurality of parameters to refine the prediction based on the prediction direction.
10. The method of clause 9, wherein, with the LIC applied to the first and second type predictions, the linear model of the first and second type predictions is derived based on the current motion vector.
11. The method of clause 9, wherein, in the case of applying the LIC to the first and second type predictions, the linear model of the first and second type predictions is derived based on at least one of the neighboring motion vectors.
12. The method of clause 9, wherein the linear model of the first type of prediction is derived based on the current motion vector if the LIC is applied to the first type of prediction, and the linear model of the second type of prediction is derived based on at least one of the neighboring motion vectors if the LIC is applied to the second type of prediction.
13. The method of any of clauses 9 to 12, wherein a different linear model is derived for the different second type of prediction based on the corresponding neighboring motion vector.
14. The method of clause 9, wherein the LIC is not applied to the second type of prediction.
15. The method of any of clauses 9-14, wherein the bitstream representation includes a flag corresponding to the current block and the neighboring block to indicate whether to enable the LIC.
16. The method of clause 15, wherein the LIC is not applied to the second type of prediction if the corresponding flag indicates that the LIC is disabled.
17. The method of any of clauses 9-16, wherein the LIC is not applied to all of the second type predictions if the number of flags of the neighboring block indicating disabling of LIC is greater than or equal to a threshold.
18. The method of any of clauses 9-17, wherein the LIC is applied to all of the second type predictions if the number of flags of the neighboring block indicating that LIC is enabled is greater than or equal to a threshold.
19. The method of any of clauses 9-18, wherein the LIC is not applied to the first type of prediction and all of the second type of prediction if the flag of the current block indicates that LIC is disabled.
20. The method of any of clauses 9-19, wherein the LIC is applied to the second type of prediction if the flag of the corresponding neighboring block indicates that LIC is enabled, even though the flag of the current block indicates that LIC is applied.
21. A video processing method, comprising:
generating a current motion vector for a current block in video data during a transition between the current block and a bitstream representation of the current block;
generating one or more neighboring motion vectors of one or more neighboring blocks of the current block;
deriving a first type prediction of the current block based on the current motion vector;
deriving one or more second type predictions for the current block based on the one or more neighboring motion vectors;
Applying generalized bi-prediction (GBi) to the first type of prediction or the second type of prediction; and
performing the conversion based on the application;
wherein the GBi comprises applying equal or unequal weights to different prediction directions of the first and second type of predictions based on GBi indexes of a weight list.
22. The method of clause 21, wherein the same GBi index is used for both the first type prediction and the second type prediction.
23. The method of clause 22, wherein the same GBi index is the GBi index of the current block.
24. The method of clause 21, wherein different GBi indices are used for the first type of prediction and the second type of prediction.
25. The method of clause 21, wherein a default GBi index is used for the first type of prediction and the second type of prediction.
26. A video processing method, comprising:
determining one or more prediction blocks of a current video block during a transition between the current video block and a bitstream representation of the current video block; and
based on the one or more prediction blocks, performing the conversion at least by using Overlapped Block Motion Compensation (OBMC) and decoder-side motion vector derivation (DMVD),
Wherein the DMVD applies refinement to the motion vectors based on the sum of absolute differences between different prediction directions, or applies refinement to the predictions based on at least one of horizontal or vertical gradients of different predictions,
wherein the OBMC derives a refined prediction based on a current motion vector of the current video block and one or more neighboring motion vectors of the neighboring block.
27. The method of clause 26, further comprising:
the conversion is performed by using the DMVD and then using the OBMC.
28. The method of clause 26 or 27, further comprising:
the conversion is performed by modifying one or more prediction samples using the OBMC to obtain modified one or more prediction samples, and then using the modified one or more prediction samples for the DMVD.
29. The method of clause 28, wherein the output of the DMVD is used as a prediction of the current video block.
30. The method of clause 26 or 27, further comprising:
the conversion is performed by modifying one or more prediction samples using the OBMC to obtain modified one or more prediction samples, then using the modified one or more prediction samples only for the DMVD, and then modifying the output of the DMVD using one or more prediction blocks generated by the OBMC.
31. The method of any one of clauses 26 to 30, further comprising:
based on the one or more prediction blocks, the conversion is performed without using the DMVD to one or more neighboring blocks.
32. The method of any one of clauses 1 to 31, further comprising:
determining whether to use one or more neighboring motion vectors of the neighboring block based on the size and/or motion information of the one or more neighboring blocks.
33. The method of clause 32, wherein determining not to use the neighboring motion vector of the neighboring block is responsive to the neighboring block having one of dimensions 4x4, 4x8, 8x4, 4x16, and 16x 4.
34. The method of clause 32, wherein determining not to use the neighboring motion vector of the neighboring block is responsive to the neighboring block having a size of one of 4x4, 4x8, 8x4, 4x16, and 16x4 and the neighboring block being bi-directionally predicted.
35. The method of any of clauses 9-20, wherein the converting is performed without using the second type of prediction based on determining that the LIC is applied to the first type of prediction.
36. The method of any of clauses 9-20, wherein the converting is performed by using the second type of prediction based on applying unidirectional prediction LIC to the first type of prediction.
37. The method of any of clauses 1 to 36, wherein whether to use the second type of prediction is determined from codec information of the current block and/or one or more neighboring blocks.
38. The method of clause 37, wherein the codec information includes a block dimension and a prediction direction.
39. A video processing method, comprising:
determining availability of at least one neighboring sample of a current video block during a transition between the current block in video data and a bitstream representation of the current video block;
generating an intra prediction for the current video block based on the availability of the at least one neighboring sample;
generating an inter prediction of the current block based on at least one motion vector;
deriving a final prediction of the current block based on a weighted sum of the intra prediction and the inter prediction; and
the conversion is performed based on the final prediction.
40. The method of clause 39, wherein a neighbor sample is deemed unusable if the neighbor sample is from a neighbor block of an intra-frame codec.
41. The method of clause 39, wherein a neighbor sample is deemed unusable if the neighbor sample is from a neighbor block of an inter-frame codec.
42. The method of clause 39, wherein if a neighbor sample is from a neighbor block of a combined inter-frame intra-prediction (CIIP) mode, then the neighbor sample is deemed unavailable;
wherein the CIIP mode applies weights to inter intra prediction and inter prediction.
43. The method of clause 39, wherein if a proximately located sample is from a proximately located block of a Current Picture Reference (CPR) mode, then the proximately located sample is deemed unavailable;
wherein the CPR uses block vectors pointing to pictures to which the neighboring blocks belong.
44. The method of any of clauses 1-43, wherein the converting generates the current block from the bitstream representation.
45. The method of any of clauses 1-44, wherein the converting generates the bitstream representation from the current block.
46. A video processing apparatus comprising a processor configured to implement the method of any one of clauses 1 to 45.
47. The apparatus of clause 46, wherein the apparatus is a video encoder.
48. The apparatus of clause 46, wherein the apparatus is a video decoder.
49. A computer-readable recording medium having stored thereon a program comprising code for causing a processor to implement the method of any one of clauses 1 to 45.
Implementations and functional operations of the subject matter described herein may be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-volatile computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing unit" or "data processing apparatus" includes all means, devices, and machines for processing data, including for example, a programmable processor, a computer, or multiple processors or groups of computers. The apparatus may include, in addition to hardware, code that creates an execution environment for a computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. The propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program does not necessarily correspond to a file in a file system. A program may be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processing and logic flows may also be performed by, and apparatus may be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disk; CD ROM and DVD ROM discs. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
The specification and drawings are intended to be regarded in an illustrative only, with the examples referring to the examples. Furthermore, unless the context clearly indicates otherwise, the use of "or" is intended to include "and/or".
While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features of particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various functions that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination and the combination of the claims may be directed to a subcombination or variation of a subcombination.
Also, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Furthermore, the separation of various system components in the embodiments described herein should not be understood as requiring such separation in all embodiments.
Only a few implementations and examples are described, and other implementations, enhancements, and variations may be made based on what is described and illustrated in this patent document.