JP6727642B2

JP6727642B2 - Focus correction processing method by learning algorithm

Info

Publication number: JP6727642B2
Application number: JP2016090290A
Authority: JP
Inventors: 力松永
Original assignee: KABUSHIKI KAISYA HOUEI
Current assignee: KABUSHIKI KAISYA HOUEI
Priority date: 2016-04-28
Filing date: 2016-04-28
Publication date: 2020-07-22
Anticipated expiration: 2036-04-28
Also published as: JP2017199235A

Description

本発明は、学習型アルゴリズムによるフォーカス補正処理方法に関する。 The present invention relates to a focus correction processing method using a learning algorithm.

映像確認のための低解像度なビューファインダー映像の視認性を向上させるために、フォーカス調整を補助する信号を多重することによって、低解像度なビューファインダーでスーパーハイビジョン映像におけるフォーカス調整を可能とする方法が、下記非特許文献１に提案されている。 Low-resolution viewfinder for image confirmation In order to improve the visibility of images, a method to enable focus adjustment in super high-definition images with a low-resolution viewfinder by multiplexing signals for assisting focus adjustment is available. , Non-Patent Document 1 below.

また、画像処理によるフォーカス補正は、様々な分野領域における画像復元処理として歴史が古く、多くの研究がある。天体画像に関しては、下記非特許文献２に詳しく開示されている。 Further, focus correction by image processing has a long history as image restoration processing in various fields, and there are many studies. The celestial body image is disclosed in detail in Non-Patent Document 2 below.

船津良平、山下誉行、三谷公二、野尻裕司、スーパーハイビジョンカメラ用フォーカス補助信号、映像情報メディア学会誌、 65-4 (2011年4月), 531-539.Ryohei Funazu, Takayuki Yamashita, Koji Mitani, Yuji Nojiri, Focus Auxiliary Signal for Super Hi-Vision Cameras, Journal of Image Information Media, 65-4 (April 2011), 531-539. J.-L.Starck, and F. Murtagh, Astronomical Image and Data Analysis, Springer, 2006.J.-L. Starck, and F. Murtagh, Astronomical Image and Data Analysis, Springer, 2006. W. H. Richardson, Bayesian-based iterative method of image restoration, Journal of the Optical Society of America, 62-1, pp. 55-59(1972)W. H. Richardson, Bayesian-based iterative method of image restoration, Journal of the Optical Society of America, 62-1, pp. 55-59 (1972) L. B. Lucy, An iterative technique for the rectification of observed distributions, Astronomical Journal, 79-6, pp. 745-754(1974)L. B. Lucy, An iterative technique for the rectification of observed distributions, Astronomical Journal, 79-6, pp. 745-754(1974)

特開２０１０−０６１５４１号公報JP, 2010-061541, A 特開２０１４−０９９０４８号公報JP, 2014-099048, A

フォーカス調整を補助する信号を映像に多重する方法を用いたとしても、目視である以上、十分なフォーカス調整が可能になるとは限らない。画像処理における方法の多くは、再構成型の処理であり、それらは、反復によるものである。代表的な方法としては、 Richardson-Lucyアルゴリズムによる方法が知られているが、反復による復元処理は膨大な処理コストが掛かる。反復回数を減らしたり、固定回数で行う等の工夫も行われているが、それでも、リアルタイムに高精細な映像を処理することは、不可能である。 Even if a method of multiplexing a signal for assisting focus adjustment on an image is used, it is not always possible to perform sufficient focus adjustment as long as it is visible. Many of the methods in image processing are reconstruction-type processes, which are iterative. A typical method is the Richardson-Lucy algorithm, but the iterative restoration process entails enormous processing costs. Although the number of repetitions has been reduced or the number of repetitions has been fixed, it is still impossible to process a high-definition image in real time.

最小二乗の意味で最適なデフラー復元を行うフィルタとして、ウィーナーフィルタが知られているが、最適な復元を行うためには、真の画像とノイズ成分のパワースペクトルが必要になる。真の画像を復元する処理のために、真の画像の情報が必要になるということは、卵と鶏の問題になる。そのような画像やノイズ成分に関する情報が近似的に得られる場合もあるが、通常は経験的な調整パラメータになる。処理には、画像全体の画素を用いるため、フレームメモリが必要になる。また、周波数領域での処理になるため、処理コストも大きい。本発明は、上述の問題点に鑑み為されたものであり、リアルタイムに高精細な映像を処理する学習型アルゴリズムによるフォーカス補正処理方法を提案することを目的とする。 A Wiener filter is known as a filter that performs optimal deflutter restoration in the meaning of least squares, but a true image and a power spectrum of a noise component are required for optimal restoration. The fact that the true image information is needed for the process of restoring the true image is a problem for eggs and chickens. Although information about such an image or noise component may be obtained approximately, it is usually an empirical adjustment parameter. Since the pixels of the entire image are used for processing, a frame memory is required. Moreover, since the processing is performed in the frequency domain, the processing cost is high. The present invention has been made in view of the above problems, and an object of the present invention is to propose a focus correction processing method using a learning algorithm for processing a high-definition image in real time.

本発明は、学習型アルゴリズムである畳み込みニューラルネットワークをデブラー復元処理に用いる。真の画像(出力期待画像)から、フォーカスずれを想定した平滑化入力画像を生成する。予め、そのような入出力画像を学習用データとして、平滑化入力画像をネットワークにより処理した結果の復元画像と真の画像の差分二乗和が最小になるようにネットワークのパラメータを推定(学習)する。学習パラメータは、学習用画像とは異なる別の評価用画像におけるデブラー復元結果の平均ISNRが最大となるように決定する。そのようにして、決定された学習済みパラメータによる畳み込みニューラルネットワークを用いて、フォーカス補正処理を行う。 The present invention uses a convolutional neural network that is a learning algorithm for deblurring processing. Generates a smoothed input image assuming a focus shift from a true image (output expected image). The parameters of the network are estimated (learned) in advance so that the sum of squared differences between the restored image obtained by processing the smoothed input image by the network and the true image is minimized using such input/output images as learning data. .. The learning parameter is determined so that the average ISNR of the deblurring restoration results in another evaluation image different from the learning image becomes maximum. In this way, focus correction processing is performed using the convolutional neural network based on the learned parameters determined.

本発明により、リアルタイムに高精細な映像を処理する学習型アルゴリズムによるフォーカス補正処理方法を提案できる。 According to the present invention, it is possible to propose a focus correction processing method using a learning algorithm that processes a high-definition image in real time.

畳み込みニューラルネットワークのブロック図を説明する図であり、フォーカス補正のためのデブラー復元処理を行う畳み込みニューラルネットワークのブロック図である。It is a figure explaining the block diagram of a convolutional neural network, and is a block diagram of a convolutional neural network which performs deblurring restoration processing for focus correction. １パスビデオ超解像における非線形エンハンサ処理を説明するブロック図である。It is a block diagram explaining the nonlinear enhancer process in 1-pass video super-resolution. ウィーナーフィルタを説明するための画像（映像）の劣化過程を説明するブロック図である。It is a block diagram explaining the deterioration process of the image (video) for explaining a Wiener filter. 非線形エンハンサ処理の動作を示す図である。It is a figure which shows operation|movement of a nonlinear enhancer process. フォーカス補正のためのデブラー処理を行う畳み込みニューラルネットワークのブロツク図である。FIG. 6 is a block diagram of a convolutional neural network that performs deblurring processing for focus correction. 学習に用いたKodakカラー評価画像（Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/）を示す。The Kodak color evaluation image (Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/) used for learning is shown. (a)は、学習曲線（右上の全体の様子を拡大して表示）を示し、（b）は反復回数に対する学習用画像(training)、および評価用画像(test)の復元結果の平均ISNR［dB］を説明する図である。(a) shows the learning curve (enlarged view of the entire upper right), and (b) shows the average ISNR of the restoration results of the training image (training) and the evaluation image (test) with respect to the number of iterations. FIG. 学習パラメータによる評価用画像のデブラー処理結果の例を説明する図であり、左からガウシアン平滑化入力画像(σ=1.0)、デブラー処理による復元画像、真の画像(出力期待画像)である。It is a figure explaining the example of the deblurring process result of the image for evaluation by a learning parameter, and is a Gaussian smoothing input image (sigma=1.0), the restoration image by a deblurring process, and a true image (output expectation image) from the left. (ａ)は学習パラメータにおける特徴マップのひとつの入力畳み込み重みパラメータを画像として可視化したものであり、いずれの特徴マップにおけるパラメータ画像もほぼ同じであるが、実際の大きさはそれぞれ異なっており(特徴マップ毎にパラメータの最大値最小値により正規化すると同じように見える)、(b)は、その２次元周波数特性を３次元プロットしたものである(パラメータの総和で正規化した結果から計算した)。(a) is a visualization of one input convolution weight parameter of the feature map in the learning parameter as an image. The parameter images in all feature maps are almost the same, but the actual sizes are different (feature (It looks the same when normalized by the maximum and minimum values of the parameters for each map.), (b) is a three-dimensional plot of its two-dimensional frequency characteristics (calculated from the results of normalization with the sum of parameters) .. ガウシアン平滑化（σ＝0.8〜1.2)に対する評価用画像（Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/）18枚の復元結果の平均ISNR［dB］であり、画素値に加えた正規ノイズレベルσ_Ｎが0.5，1.0の結果も示している。Image for evaluation against Gaussian smoothing (σ=0.8 to 1.2) (Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/) This is the average ISNR [dB] of 18 restoration results and pixel The results of the normal noise level σ _N added to the values of 0.5 and 1.0 are also shown. ＲｅＬＵ活性化関数（参考文献［5,14］）により半波整流された正弦波を説明する図である。It is a figure explaining the sine wave half-wave rectified by the ReLU activation function (reference [5, 14]).

（発明の概要１）
４Ｋ／８Ｋ(スーパーハイビジョン)超高精細映像におけるフォーカスずれを補正することを目的として、学習型アルゴリズムである畳み込みニューラルネットワークによるデブラー処理を行う。真の画像(出力期待画像)から、フォーカスずれを想定した平滑化入力画像を生成する。予め、そのような入出力画像を学習用データとして、平滑化入力画像をネットワークにより処理した結果の復元画像と真の画像の差分二乗和が最小になるようにネットワークのパラメータを推定(学習)する。学習結果のパラメータを用いた畳み込みニューラルネットワークによる処理は、ＣＰＵ、ＧＰＵ、ＦＰＧＡによる実装が可能であり、従来法と比較して、少ない処理コスト、局所領域処理による低フレーム遅延、高いデブラー性能、ノイズ耐性を実現する。 (Outline of Invention 1)
4K/8K (Super Hi-Vision) Deblurring is performed by a convolutional neural network, which is a learning algorithm, for the purpose of correcting defocus in ultra-high definition images. Generates a smoothed input image assuming a focus shift from a true image (output expected image). The parameters of the network are estimated (learned) in advance so that the sum of squared differences between the restored image obtained by processing the smoothed input image by the network and the true image is minimized using such input/output images as learning data. .. The processing by the convolutional neural network using the parameters of the learning result can be implemented by CPU, GPU, and FPGA, and compared with the conventional method, the processing cost is low, the low frame delay by the local area processing, the high deblurring performance, the noise. Achieve resistance.

上述のように、畳み込みニューラルネットワークによるデブラー復元処理は、学習用画像における最小二乗の意味で最適な非線形エンハンサであり、ウィーナーフィルタに匹敵するデブラー復元精度を実現する。局所領域処理であるにも関わらず、ウィーナーフィルタの良い近似であり、画像(映像)に含まれるノイズに対する耐性もウィーナーフィルタよりもロバストな方法である。畳み込みニューラルネットワークは、ＣＰＵ、ＧＰＵ、ＦＰＧＡにより実現することが可能であり、高精細映像におけるリアルタイム処理を行うための有効な方法である。 As described above, the deblurring restoration process by the convolutional neural network is an optimal nonlinear enhancer in the sense of least squares in the learning image, and realizes the deblurring precision comparable to the Wiener filter. Despite the local area processing, it is a good approximation of the Wiener filter, and is more robust to the noise contained in the image (video) than the Wiener filter. The convolutional neural network can be realized by a CPU, GPU, and FPGA, and is an effective method for performing real-time processing on high-definition video.

（発明の概要２）
本発明は、学習型アルゴリズムである畳み込みニューラルネットワークにより画像(映像)復元処理を行う。また、事前の学習により、期待する出力画像を生成するためのネットワークパラメータを学習する。パラメータの決定には、学習用画像とは異なる別の評価用画像における平均ISNRが最大となるパラメータとする。また、このような学習用画像とは異なる評価用画像を用いることにより、学習用画像に過剰に適合するパラメータの過学習を防ぎ、学習用画像以外の画像(映像)においても、高いデブラー復元性能を実現する汎化能力を得ることが可能となる。 (Outline of Invention 2)
The present invention performs image (video) restoration processing using a convolutional neural network that is a learning algorithm. In addition, the network parameters for generating the expected output image are learned by prior learning. In determining the parameter, the parameter that maximizes the average ISNR in another evaluation image different from the learning image is used. In addition, by using an evaluation image different from such a learning image, it is possible to prevent over-learning of parameters that are excessively adapted to the learning image, and even in an image (video) other than the learning image, high deblurring performance. It is possible to obtain the generalization ability to realize.

また、畳み込みニューラルネットワークを構成する入力畳み込み層(特徴マップ)や、畳み込みニューラルネットワークを構成する入力畳み込み層(特徴マップ)に続く非線形活性化関数や、畳み込みニューラルネットワークを構成する入力畳み込み層(特徴マップ)及び非線形活性化関数の結果を統合する出力層を備え、さらに、畳み込みニューラルネットワークの各層のパラメータを推定するための学習用データとしての画像および学習方法とする。また、畳み込みニューラルネットワークの各層のパラメータを決定するための学習用画像とは異なる別の評価用画像における平均ISNRが最大になるようにパラメータを決定する方法とする。 The input convolutional layer (feature map) that forms the convolutional neural network, the non-linear activation function that follows the input convolutional layer (feature map) that forms the convolutional neural network, and the input convolutional layer (feature map that forms the convolutional neural network). ) And an output layer that integrates the results of the non-linear activation function, and an image and a learning method as learning data for estimating the parameters of each layer of the convolutional neural network. In addition, the method is such that the parameters are determined so that the average ISNR in another evaluation image different from the learning image for determining the parameters of each layer of the convolutional neural network is maximized.

実現方法としては、ベースバンドビデオ信号を処理するハードウェア装置により実現することも可能であるし、ＭＸＦファイルを処理するソフトウェアおよびそれを実行するコンピュータをベースとした装置により実現することも可能であるし、ＭＸＦファイルをベースバンドビデオ信号に変換、あるいは逆変換する装置を用いれば、いかなる構成による実現も可能である。カメラ映像を動画像圧縮したもの、あるいはＭＸＦファイルをＩＰ(インターネット・プロトコル)伝送して、クラウド上で処理を行うことも可能である。ＩＰ伝送された圧縮映像をベースバンドビデオ信号に復号して、フォーカス補正処理を行った結果を再び圧縮してストリーム配信する等様々なシステム形態が考えられる。 As an implementation method, it can be implemented by a hardware device that processes a baseband video signal, or can be implemented by a software that processes an MXF file and a computer-based device that executes the software. However, by using a device that converts an MXF file into a baseband video signal or an inverse conversion, it can be realized by any configuration. It is also possible to perform processing on the cloud by transmitting a camera image compressed as a moving image or transmitting an MXF file by IP (Internet Protocol). Various system forms are conceivable, such as decoding a compressed image transmitted by IP into a baseband video signal, compressing the result of focus correction processing again, and distributing the stream.

畳み込みニューラルネットワークにおける入力畳み込み層(特徴マップ)の数や、入力畳み込み層と出力層の間に、さらに各層を接続する複数の隠れ層、非線形活性化関数を追加して、ネットワーク構造を多層化することにより、デブラー復元精度の向上が期待される。学習用画像として、画像(映像)に含まれると想定されるノイズを付加したものを用意することによって、ノイズ除去能力も同時に有することが期待される。 Multilayer the network structure by adding the number of input convolutional layers (feature maps) in the convolutional neural network, multiple hidden layers connecting each layer between the input convolutional layer and the output layer, and a nonlinear activation function. As a result, improvement in deblurring restoration accuracy is expected. By preparing a learning image to which noise that is supposed to be included in an image (video) is added, it is expected that the learning image also has noise removal capability.

図１は、畳み込みニューラルネットワークのブロック図を説明する図であり、フォーカス補正のためのデブラー復元処理を行う畳み込みニューラルネットワークのブロック図である。入力画像ｇ_i,jを(2L＋1)×(2L＋1)画素ブロックサイズのカーネルにより、畳み込み処理を行う。そのような、Ｍ組の入力畳み込み層(特徴マップ)の結果をそれぞれ非線形活性化関数により非線形レベル操作を行う。非線形活性化関数の結果を重み付け加算した出力層の結果を出力クリップ関数処理したものを、最終的な出力画像 FIG. 1 is a diagram illustrating a block diagram of a convolutional neural network, which is a block diagram of a convolutional neural network that performs deblurring restoration processing for focus correction. Convolution processing is performed on the input image g _i,j with a kernel having a (2L+1)×(2L+1) pixel block size. Non-linear level operations are performed on the results of such M sets of input convolutional layers (feature maps) by non-linear activation functions. The final output image is the output clipped result of the output layer result obtained by weighted addition of the results of the nonlinear activation function.

とする。 And

畳み込みニューラルネットワークによるデブラー復元処理の比較として、１パスビデオ超解像に用いた非線形エンハンサ処理、および最小二乗の意味で最適な復元フィルタとして知られているウィーナーフィルタについて、以下に示す。 As a comparison of the deblurring process by the convolutional neural network, the nonlinear enhancer process used for the 1-pass video super-resolution and the Wiener filter known as the optimum restoration filter in the meaning of least squares are shown below.

図２は、１パスビデオ超解像における非線形エンハンサ処理を説明するブロック図である。１次元の場合で説明する。入力信号をＤｏＧ(Difference of Gaussian)フィルタによりエッジ成分を検出し、それをレベルに関する非線形操作により高調波成分を復元して、入力信号に加算する。過剰な強調を抑制するために、入力近傍領域における画素の最大最小値を探索して、クリップレベルとする適応クリップ処理を併用する。また、図３は、ウィーナーフィルタを説明するための画像（映像）の劣化過程を説明するブロック図である。 FIG. 2 is a block diagram illustrating a non-linear enhancer process in 1-pass video super-resolution. The case of one dimension will be described. An edge component of the input signal is detected by a DoG (Difference of Gaussian) filter, a harmonic component is restored by a non-linear operation regarding the level, and the edge component is added to the input signal. In order to suppress excessive emphasis, adaptive clip processing that searches for the maximum and minimum values of pixels in the input neighborhood area and sets the clip level is also used. In addition, FIG. 3 is a block diagram illustrating a deterioration process of an image (video) for explaining the Wiener filter.

となるような， Such that

を求めるフィルタ。周波数領域で考えると、 The filter to find. Considering in the frequency domain,

ここで、 here,

は、 Is

の複素共役であり、調整関数に関しては、 Is the complex conjugate of, and for the adjustment function

である。
また、 Is.
Also,

は、それぞれノイズ成分、真の画像のパワースペクトル密度である。 Are the noise component and the power spectral density of the true image, respectively.

（詳細な説明：超高精細映像のための畳み込みニューラルネットワークによるフォーカス補正について） (Detailed description: Focus correction by convolutional neural network for ultra high definition video)

（Abstract）
4K/8K (スーパーハイビジョン（参考文献［15］））超高精細映像におけるフォーカスずれを補正することを目的として、畳み込みニューラルネットワークによるデブラー処理を行う。畳み込みニューラルネットワークによるデブラー処理の復元性能、ノイズ耐性を評価する。さらに、１パスビデオ超解像（参考文献［12］）における非線形エンハンサ処理、真の画像との二乗誤差を最小化するウィーナーフィルタ（参考文献［21］）による結果と比較する。 (Abstract)
4K/8K (Super Hi-Vision (Reference [15])) Performs deblurring by a convolutional neural network for the purpose of correcting defocus in ultra-high definition video. We evaluate the restoration performance and noise resistance of deblurring processing by a convolutional neural network. Furthermore, we compare the results with the nonlinear enhancer processing in 1-pass video super-resolution (reference [12]) and the Wiener filter (reference [21]) that minimizes the squared error from the true image.

（１はじめに）
次世代テレビ放送としての4Kの試験放送が2014年６月２日より、CS(Communication Satellite)、およびケーブルテレビにて開始された（次世代放送推進フォーラム（ＮｅｘＴＶ-Ｆ），http://www.nextv-f.jP/）。8K(スーパーハイビジョン（参考文献［15］）も含め、2018年(可能な限り早期に)の実用放送開始へ向けて加速している（総務省「4K・8Kロードマップに関するフォローアップ会合（第６回会合）配布資料」，平成２７年７月、http://www.soumu.go.jp/main_sosiki/kenkyu/4k8kroadmap/02ryutsu11_03000046.html）。 (1 Introduction)
4K test broadcasting as a next-generation television broadcasting was started on CS (Communication Satellite) and cable television from June 2, 2014 (Next Generation Broadcasting Promotion Forum (NexTV-F), http://www .nextv-f.jP/). 8K (including Super Hi-Vision (reference [15]) is accelerating towards the start of practical broadcasting in 2018 (as soon as possible) (Ministry of Internal Affairs and Communications "Follow-up meeting on 4K/8K roadmap (6th Meetings) Handouts", July 2015, http://www.soumu.go.jp/main_sosiki/kenkyu/4k8kroadmap/02ryutsu11_03000046.html).

4K/8K放送におけるＨＤコンテンツのリパーパス(repurpose)のためには、解像度変換が必要になる。近年、超解像技術が盛んに研究されている（参考文献［16］）。その処理の多くは反復によるものであるが、本発明者は画像の局所的な時間空間方向による補間の重み付け平均とマルチスケール化した非線形エンハンサによる１パスビデオ超解像を提案した（参考文献［12］）。趙・松永（参考文献［23］）は１パスビデオ超解像処理をＧＰＵにより高速化した。 Resolution conversion is required for repurpose of HD contents in 4K/8K broadcasting. Recently, super-resolution technology has been actively researched (reference [16]). Although much of the processing is iterative, the inventor has proposed a one-pass video super-resolution with weighted averaging of local temporal and spatial interpolation of images and a multi-scaled nonlinear enhancer (reference [[ 12]). Zhao and Matsunaga (reference [23]) have accelerated the 1-pass video super-resolution processing with a GPU.

4K/8K超高精細映像の撮影には、フォーカスの調整が厳格に求められるが、映像の高解像度化により、光学サイズは大きく、撮像素子の画素サイズは小さくなり、被写界深度が浅くなると、フォーカス調整は格段に難しくなっている。撮影後にフォーカスずれが確認されることも少なくない。 Focus adjustment is strictly required for shooting 4K/8K ultra-high-definition video, but due to the high resolution of the video, the optical size is large, the pixel size of the image sensor is small, and the depth of field becomes shallow. , Focus adjustment is much more difficult. It is not uncommon to find out of focus after shooting.

船津ら（参考文献［6］）は、映像確認のための低解像度なビューファインダー映像の視認性を向上させるために、フォーカス調整を補助する信号を多重することによって、低解像度なビューファインダーでスーパーハイビジョン映像におけるフォーカス調整を可能とする方法を提案した。 Funatsu et al. (reference [6]) have developed a low-resolution viewfinder that superimposes signals by multiplexing signals that assist focus adjustment in order to improve the visibility of the low-resolution viewfinder image for image confirmation. We proposed a method that enables focus adjustment in high-definition video.

本発明では、4K/8K超高精細映像におけるフォーカスずれを補正することを目的として、畳み込みニューラルネットワークによるデブラー処理を行う。畳み込みニューラルネットワークによるデブラー処理と１パスビデオ超解像における非線形エンハンサ処理との類似性を指摘するとともに、畳み込みニューラルネットワークにおけるレベルに関する非線形操作を行うReLU(Rectified Linear Unit)活性化関数（参考文献［5，14］）による処理結果をフーリエ級数展開することにより、高調波成分が発生していることを明らかにする。そして、畳み込みニューラルネットワークによるデブラー処理の復元性能、ノイズ耐性を評価する。さらに、１パスビデオ超解像（参考文献［12］）における非線形エンハンサ処理、真の画像との二乗誤差を最小化するウィーナーフィルタ（参考文献［21］）による結果と比較する。 In the present invention, deblurring processing by a convolutional neural network is performed for the purpose of correcting the focus shift in 4K/8K ultra-high definition video. We point out the similarity between deblurring by convolutional neural networks and nonlinear enhancer processing in 1-pass video super-resolution, and perform ReLU (Rectified Linear Unit) activation function that performs nonlinear operations on levels in convolutional neural networks (reference [5 , 14]) and the Fourier series expansion of the processing result reveals that harmonic components are generated. Then, the restoration performance and noise resistance of the deblurring process by the convolutional neural network are evaluated. Furthermore, we compare the results with the nonlinear enhancer processing in 1-pass video super-resolution (reference [12]) and the Wiener filter (reference [21]) that minimizes the squared error from the true image.

近年、畳み込みニューラルネットワーク（参考文献［5，10］）は、深層学習(deep learning)として、再び注目が集まっているが、脳神経系における情報処理のモデルとしてのニューラルネットワークの歴史は古く、McCullochとPittsによる形式ニューロン（参考文献［13］）、Rosenblattによるパーセプトロン(preceptron)（参考文献［18］）まで遡ることができる。Rumelhurtら（参考文献［19］）が多層パーセプトロンの学習則として、誤差逆伝播法(バックプロパゲーション)を再発見したことから、1980年代に爆発的に広まった（甘利によって隠れ層を持つパーセプトロンの学習則が既に提案されていた（参考文献［1］）。深層学習による画像認識（参考文献［9］）において標準的に用いられている畳み込みニューラルネットワークも、当時ＮＨＫ放送科学基礎研究所(現ＮＨＫ放送技術研究所)に在籍していた福島によるネオコグニトロン（参考文献［5］）そのものである。シグモイド関数と比較して学習が高速に行われるとするReLU活性化関数（参考文献［14］）も既に用いられていた(ＲｅＬＵとは呼んでいない)。LeCunら（参考文献［10］）は，手書きの郵便番号を認識するために、畳み込みニューラルネットワークを誤差逆伝播法(バックプロパゲーション)により学習させた。ニューラルネットワークの初期の研究において日本人研究者の寄与があったことは、もっと認識されるべきだと思われる。）。 In recent years, convolutional neural networks (references [5, 10]) have regained attention as deep learning, but neural networks have a long history as models of information processing in the cranial nervous system, and McCulloch and We can go back to the formal neuron by Pitts (ref. [13]) and the preceptron by Rosenblatt (ref. [18]). Rumelhurt et al. (reference [19]) rediscovered the error backpropagation method (backpropagation) as a learning rule for multilayer perceptrons, which spread explosively in the 1980s (perceptrons with hidden layers due to Amari). A learning rule had already been proposed (reference [1]).The convolutional neural network that is standardly used in image recognition by deep learning (reference [9]) was also used at the NHK Broadcasting Science Laboratories (currently This is the neo-cognitron by Fukushima (reference [5]) who was a member of the NHK Broadcasting Technology Research Laboratories itself.. The ReLU activation function (reference [14] that the learning is performed faster than the sigmoid function). ]) has also been used (not called ReLU).LeCun et al. [10] used a convolutional neural network for backpropagation (backpropagation) to recognize handwritten postal codes. ).The contribution of Japanese researchers in the early research of neural networks should be more recognized.).

畳み込みニューラルネットワークによる深層学習は、画像認識を目的とした研究（参考文献［9］）が盛んであるが、デノイジングやデブラー、超解像といった画像処理にも用いられている（参考文献［7、4、3、22］）。デブラー処理も、様々な分野領域における画像復元処理（天体画像に関しては（参考文献［20］）が詳しい）として歴史が古く、多くの研究があるが、再構成型の処理は反復によるものである（参考文献［2］）。 Although deep learning using convolutional neural networks has been actively researched for image recognition (reference [9]), it is also used for image processing such as denoising, deblurring, and super-resolution (reference [7, 4, 3, 22]). The deblurring process has a long history as an image restoration process in various fields (for details on astronomical images (reference [20])), there are many studies, but the reconstruction process is iterative. (Reference [2]).

本発明の説明構成は、２章で、１パスビデオ超解像における非線形エンハンサ処理、３章で、畳み込みニューラルネットワークの構成と学習方法について、それぞれ説明し、４章で、画像シミュレーションによる結果を示し、５章で纏める。 The configuration of the present invention will be described in Chapter 2, the non-linear enhancer processing in 1-pass video super-resolution, and in Chapter 3, the configuration and learning method of the convolutional neural network will be described respectively, and in Chapter 4, the results of the image simulation will be shown. It will be summarized in Chapter 5.

（２１パスビデオ超解像における非線形エンハンサ）
本発明者は、１パスビデオ超解像として、フレーム内空間方向性補間による解像度変換処理の結果の後処理として、画像のエッジ情報に基づいた非線形エンハンサ処理を行うことにより、さらなる解像度の向上を図った（参考文献［12］）。図４に非線形エンハンサ処理の動作を示す。 (2 Non-linear enhancer in 1-pass video super-resolution)
The present inventor further improves the resolution by performing the nonlinear enhancer processing based on the edge information of the image as the post-processing of the result of the resolution conversion processing by the intra-frame spatial directional interpolation as the 1-pass video super-resolution. (Reference [12]). FIG. 4 shows the operation of the nonlinear enhancer processing.

エッジの検出にはガウシアン差分（Difference of Gaussian、ＤｏＧ)フィルタを用いる。ガウシアン差分を計算するためのガウシアン平滑化フィルタの処理カーネルを、 A Gaussian difference (Difference of Gaussian, DoG) filter is used for edge detection. The processing kernel of the Gaussian smoothing filter to calculate the Gaussian difference,

とすると、画像I（ｘ）のＤｏＧフィルタは、 Then, the DoG filter of image I(x) is

である(ただし、１次元の場合)。ここで、＊は畳み込み演算であり、σ１＜σ２である（σ１→０とすると、 (However, in the case of one dimension). Here, * is a convolution operation, and σ1<σ2 (assuming σ1→0,

となり、式（2）の結果を適当にゲインしたものを原信号に加算するのが、所謂“アンャープ・マスキング(Unsharp Masking)”に相当する。）。ＤｏＧフィルタはガウシアン平滑化フィルタの２次微分であるラプラシアン(Laplacian of Gaussian，ＬｏＧ)フィルタの良い近似であり、計算効率も高い。画像の場合には、水平垂直方向に分離して処理を行うことができる。ラプラシアンフィルタ同様、方向によらないエッジ検出が可能である。 Therefore, adding a value obtained by appropriately gaining the result of Expression (2) to the original signal corresponds to so-called "Unsharp Masking". ). The DoG filter is a good approximation of a Laplacian (Laplacian of Gaussian, LoG) filter which is the second derivative of the Gaussian smoothing filter, and has high calculation efficiency. In the case of an image, it can be processed separately in the horizontal and vertical directions. Like the Laplacian filter, edge detection independent of direction is possible.

ＤｏＧフィルタにより検出されたエッジ成分をレベルに関する非線形操作により高周波成分を拡張して、原画像に加えるが、ここで、非線形操作による過剰な強調を抑制するために、注目画素近傍における入力画素値の最大値最小値を探索して、それらの値による適応的なクリップ処理を行う。レベルに関する非線形操作としては、例えば、 The edge component detected by the DoG filter is added to the original image by expanding the high-frequency component by a non-linear operation related to the level, but here, in order to suppress excessive emphasis due to the non-linear operation, the input pixel value in the vicinity of the pixel of interest is suppressed. The maximum value and the minimum value are searched for, and the adaptive clip processing is performed using those values. As a non-linear operation regarding the level, for example,

ここで、sgn(・)は符号関数であり、ｒは２以上の定数である。本発明者は、さらに、このような非線形エンハンサをマルチスケール拡張している（参考文献［12］）(詳細省略)。 Here, sgn(•) is a sign function, and r is a constant of 2 or more. The present inventor further multi-scale extends such a non-linear enhancer (reference document [12]) (details omitted).

（３畳み込みニューラルネットワーク）
図５はフォーカス補正のためのデブラー処理を行う畳み込みニューラルネットワークのブロツク図である。畳み込みニューラルネットワークは最小構成の２層とする。畳み込みニューラルネットワークの各層は次のようになる。 (3 convolutional neural network)
FIG. 5 is a block diagram of a convolutional neural network that performs deblurring processing for focus correction. The convolutional neural network has a minimum configuration of two layers. The layers of the convolutional neural network are as follows.

ここで、 here,

は、それぞれ入出力画像の画素値であり、活性化関数
Are the pixel values of the input and output images, and the activation function

は次のようになる。 Is as follows:

上記式（８）のＸmaxは、最大値を表す。そして、次の目的関数Ｊを最小化する各層のパラメータを推定する。 Xmax in the above formula (8) represents the maximum value. Then, the parameters of each layer that minimize the following objective function J are estimated.

ここで、 here,

は、出力画像として期待する真の画像における画素値である。各層のパラメータのＪに関する勾配、および活性化関数の微分を付録Ａに示す。 Is a pixel value in a true image expected as an output image. The slope of the parameters of each layer with respect to J and the derivative of the activation function are shown in Appendix A.

活性化関数φ(ｘ)は、 ReLUとして知られているものであるが（参考文献［5、14］）、非線形エンハンサ同様、レベルに関する非線形操作が高周波成分の復元に重要な役割を果たす。負の成分をクリップするReLU活性化関数は”半波整流器(ダイオード)”に相当するものであり、半波整流された正弦波には、高調波成分が発生していることがわかる(付録Ｂ参照)。 The activation function φ(x) is known as ReLU (references [5, 14]), but like the non-linear enhancer, the non-linear operation regarding the level plays an important role in the restoration of the high frequency component. The ReLU activation function that clips the negative component is equivalent to a "half-wave rectifier (diode)", and it can be seen that a harmonic component is generated in the half-wave rectified sine wave (Appendix B reference).

値を定めるべきパラメータすべてに通し番号を付けて(2L＋1)(2L＋1)M十M＋1次元ベクトル Serialize all the parameters whose values should be determined (2L+1) (2L+1)M+M+1-dimensional vector

を次のように定義する。 Is defined as follows.

ある初期値ｕ^(o)を定め、次の確率的勾配降下法（参考文献［11］）によってｕを定める。 A certain initial value u ^(o) is determined, and u is determined by the following stochastic gradient descent method (reference [11]).

ここにλは微小な学習係数である。これを Where λ is a small learning coefficient. this

となるまで反復する。Ｌ=3， M=8とすると、すべてのパラメータ数は、７×７×８＋８＋1＝401になる。学習用画像には、真の画像を出力期待画像として、フォーカスずれに相当するガウシアン平滑化したものを入力画像とする。実際の学習では、反復毎に入出力画像間で同じ位置の部分画像をランダムに適当な枚数を抽出して用いる(ミニバッチ学習（参考文献［11］）。 Repeat until When L=3 and M=8, the total number of parameters is 7×7×8+8+1=401. As the learning image, a true image is used as an output expected image, and a Gaussian-smoothed image corresponding to defocus is used as an input image. In actual learning, an appropriate number of partial images at the same position between the input and output images is randomly extracted and used for each iteration (mini-batch learning (reference [11]).

パラメータ更新を加速させるためには、次のようなモーメンタム法（参考文献［19，11］）を用いるとよい。 In order to accelerate the parameter update, the following momentum method (references [19, 11]) may be used.

モーメンタム係数μは、０≦μ＜１に設定する。これは、パラメータに対する巡回型フィルタと見なすことができる。 The momentum coefficient μ is set to 0≦μ<1. This can be viewed as a recursive filter on the parameters.

学習を安定化させるためには、学習係数λを反復回数により指数関数的に減少させるとよい。例えば、反復回数100回以上の場合、初期学習係数λ_０の1/10として、さらに、反復回数１万回以上の場合、初期学習係数λ_０の1/10を反復回数により指数関数的に減少させて、反復回数10万回でλ_０/lOOとするためには、 In order to stabilize the learning, it is advisable to decrease the learning coefficient λ exponentially with the number of iterations. For example, when the number of iterations is 100 or more, the initial learning coefficient λ ₀ is set to 1/10, and when the number of iterations is 10,000 or more, 1/10 of the initial learning coefficient λ ₀ is exponentially reduced by the number of iterations. In order to obtain λ ₀ /lOO with 100,000 iterations,

として、 As

とすればよい。 And it is sufficient.

（４画像シミュレーション）
ガウシアン平滑化入力画像を畳み込みニューラルネットワークによりデブラー復元する画像シミュレーションを行う。真の画像を出力期待画像として、σ=1.0のガウシアンフイルタにより平滑化したものを入力画像とする。反復毎に入出力画像間で同じ位置の部分画像（33×33画素サイズ）を画像毎にランダムに256枚抽出して、学習に用いた(ミニバッチ学習（参考文献［11］）。図６に、学習に用いたKodakカラー評価画像（Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/）を示す。実験には、Ｇ（グリーン）画像のみを用いている。畳み込みニューラルネットワークの構成は、特徴マップ数M=8、入力畳み込みカーネルサイズは７×７画素サイズ（L＝3）とした。学習のためのパラメータ初期値は、入力畳み込み重みパラメータ (4 image simulation)
Image simulation is performed to deblurr the Gaussian smoothed input image using a convolutional neural network. A true image is used as an output expected image, and a smoothed image with a σ=1.0 Gaussian filter is used as an input image. For each iteration, 256 partial images (33 × 33 pixel size) at the same position between the input and output images were randomly extracted for each image and used for learning (mini-batch learning (reference [11]). , Kodak color evaluation images (Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/) used for learning are shown.Only G (green) images are used in the experiment. The network configuration is M=8 feature maps, and the input convolution kernel size is 7×7 pixel size (L=3) The initial parameters for learning are the input convolution weight parameters.

は、平均O、標準偏差0.01の正規乱数、特徴マップ重みパラメータ Is mean O, normal random number with standard deviation 0.01, feature map weight parameter

は、［0.0，0.1）の一様乱数、バイアス項ｂ＝0.0とした。初期学習係数λ₀=8 ×10^-6、モーメンタム係数μ＝0.9として、学習係数λは、式(14)の反復回数による制御を行った。 Is a uniform random number of [0.0, 0.1) and the bias term b=0.0. With the initial learning coefficient λ ₀ =8×10 ⁻⁶ and the momentum coefficient μ=0.9, the learning coefficient λ was controlled by the number of iterations of the equation (14).

また、上述のように、図６は、学習用画像（Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/）。真の画像を出力期待画像として、a=1.0のガウシアンフィルタにより平滑化したものを入力画像とする。反復毎に入出力画像間で同じ位置の部分画像（33×33画素サイズ）を画像毎にランダムに256枚抽出して、学習に用いた（ミニバッチ学習（参考文献［11］））。実験には、Ｇ（グリーン）画像のみを用いている。画像中枠は学習に用いた部分画像のサイズを表している。 Further, as described above, FIG. 6 shows a learning image (Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/). The true image is used as the output expected image, and the smoothed image with the Gaussian filter with a=1.0 is used as the input image. For each iteration, 256 partial images (33 × 33 pixel size) at the same position between the input and output images were randomly extracted for each image and used for learning (mini-batch learning (reference [11])). Only G (green) images were used in the experiment. The frame in the image represents the size of the partial image used for learning.

また、図７(a)は、学習曲線（右上の全体の様子を拡大して表示）を示し、（b）は反復回数に対する学習用画像(training)、および評価用画像(test)の復元結果の平均ISNR［dB］を説明する図である。いずれも横軸は反復回数であり、対数目盛である。エラーバーは標準偏差である。反復回数98,500回で評価用画像の復元結果の平均ISNRが最大であった。 Further, FIG. 7(a) shows a learning curve (enlarged display of the entire state on the upper right side), and FIG. 7(b) shows the results of restoration of the learning image (training) and the evaluation image (test) with respect to the number of iterations. 4 is a diagram illustrating the average ISNR [dB] of FIG. In each case, the horizontal axis is the number of iterations and is on a logarithmic scale. Error bars are standard deviations. The average ISNR of the restoration result of the evaluation image was the maximum at 98,500 iterations.

上述のように、図７(a)は、反復回数に対する学習用画像の残差Ｊ(式(10))をプロットしたものである(学習曲線)。そして、同図(b)は、反復回数に対する学習パラメータにより、学習用画像(training)、および学習用画像とは別の評価用画像(test)の復元結果と真の画像との間のISNR (Improvement in ＳＮＲ)の平均をプロットしたものである。評価用画像には、Kodakカラー評価画像（Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/）24枚のうちの学習用画像６枚とは別の18枚を用いた。ISNRは、復元処理によるSN比の改善度を測るものであり、次のように計算される（参考文献［2］）。 As described above, FIG. 7A is a plot of the learning image residual J (equation (10)) against the number of iterations (learning curve). Then, FIG. 6B shows a learning image for the number of iterations, a training image (training), and an ISNR between the restoration result of the evaluation image (test) different from the learning image and the true image. Improvement in SNR) is plotted. As the evaluation images, 18 of the 24 Kodak color evaluation images (Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/) other than the 6 learning images were used. .. ISNR measures the degree of improvement of the SN ratio due to restoration processing, and is calculated as follows (reference [2]).

ここで、 here,

は真の画像、 Is a true image,

はガウシアン平滑化入力画像、 Is the Gaussian smoothed input image,

はデブラー復元画像である。 Is a deblurred image.

学習用画像、および評価用画像の平均ISNRは、いずれも、学習が進むに連れて、徐々に増加しているが、評価用画像の平均ISNRは、その後、減少に転じている。学習用画像の残差は減少しているため、これは、“過学習”が生じているものと思われる。そこで、評価用画像における平均ISNRが最大となるパラメータを最終的な学習結果とする。図７(b)では、反復回数98,500回で評価用画像の復元結果の平均ISNRが最大であった。図８は、そのような学習パラメータによる評価用画像のデブラー処理結果の例である。それぞれ、左からガウシアン平滑化入力画像(σ=1.0)、デブラー処理による復元画像、真の画像(出力期待画像)である。デブラー復元画像のISNRは5.33/5.54［dB］であった。同図下段は、それらのＦＦＴ処理による周波数成分の２値化画像である(しきい値100)。高周波成分が制限されたガウシアン平滑化入力画像に対して、畳み込みニューラルネットワークによるデブラー処理の結果は、高周波成分が復元されていることがわかる。 The average ISNR of the learning image and the evaluation image both gradually increase as the learning progresses, but the average ISNR of the evaluation image then starts to decrease. This is likely due to "over-learning" as the residuals in the training images are decreasing. Therefore, the parameter that maximizes the average ISNR in the evaluation image is used as the final learning result. In FIG. 7B, the average ISNR of the restoration results of the evaluation image was the maximum at the number of iterations of 98,500. FIG. 8 is an example of a deblurring result of an evaluation image using such a learning parameter. From the left, the Gaussian smoothed input image (σ=1.0), the restored image by deblurring, and the true image (output expected image) are shown. The ISNR of the deblurred image was 5.33/5.54 [dB]. The lower part of the figure is a binarized image of frequency components by the FFT processing (threshold value 100). As for the result of the deblurring process by the convolutional neural network for the Gaussian smoothed input image in which the high frequency component is limited, it can be seen that the high frequency component is restored.

図９(a)は、学習パラメータにおける特徴マップのひとつの入力畳み込み重みパラメータ FIG. 9A shows one input convolution weight parameter of the feature map in the learning parameter.

を画像として可視化したものである。いずれの特徴マップにおける Is visualized as an image. In any feature map

パラメータ画像もほぼ同じであるが、実際の大きさはそれぞれ異なっている(特徴マップ毎にパラメータの最大値最小値により正規化すると同じように見える)。同図(b)は、その２次元周波数特性を３次元プロットしたものである(パラメータの総和で正規化した結果から計算した)。周波数特性から高域強調フィルタであることがわかるが、それぞれの高域強調ゲインが異なる。それぞれの特徴マップで高域が強調された結果が、ReLU活性化関数により、負の成分がクリップされて、それらの重み付け加算の結果が出力クリップされて最終的な出力となる。複数の異なるゲインによる高域強調のクリップ処理結果が統合されて、高周波成分が復元されていると考えられる。学習結果によっては、特徴マップにおける入力畳み込み重みパラメータ The parameter images are almost the same, but the actual sizes are different (they look the same when normalized by the maximum and minimum values of the parameters for each feature map). FIG. 3B is a three-dimensional plot of the two-dimensional frequency characteristic (calculated from the result of normalization with the sum of parameters). It can be seen from the frequency characteristics that they are high-frequency emphasis filters, but the respective high-frequency emphasis gains are different. The result in which the high frequency band is emphasized in each feature map is clipped by the ReLU activation function as the negative component, and the result of the weighted addition is clipped as the output to be the final output. It is considered that the high-frequency component is restored by integrating the clip processing results of high-frequency emphasis with a plurality of different gains. Depending on the learning result, the input convolution weight parameter in the feature map

特徴マップ重みパラメータ Feature map weight parameter

で、ほぼ０のものが存在していた。パラメータ初期値は乱数により生成され、学習用画像は反復毎にランダムに抽出されるため、正しい学習がなされなかったものと思われる。 Then, almost 0 existed. The parameter initial values are generated by random numbers, and the learning image is randomly extracted at each iteration, so it is considered that correct learning was not performed.

次に、学習されたパラメータにおける畳み込みニューラルネットワークのデブラー処理の復元性能と、ノイズ耐性を評価する。σ=1.0のガウシアンフィルタによる平滑化画像を学習した畳み込みニューラルネットワークに対して、σを、0.8から1.2まで、0.1刻みで変化させたガウシアン平滑化画像の復元結果を評価する。さらに、学習には、ノイズが含まれていない平滑化画像を用いたが、実際の画像には、圧縮ノイズや撮像ノイズが含まれているのが普通である。そこで、画素値に正規ノイズを加えた場合の復元精度も評価する。正規ノイズは平均０、標準偏差σ_Ｎを、0.5，1.0とした。１パスビデオ超解像における非線形エンハンサ（参考文献［12］）、およびウィーナーフィルタ（参考文献［21］）による復元処理も行う。ウィーナーフィルタに関しては、付録Ｃに示す。 Next, the restoration performance of the deblurring process of the convolutional neural network in the learned parameters and the noise resistance are evaluated. For a convolutional neural network that learned a smoothed image by a Gaussian filter with σ=1.0, we evaluate the restoration result of a Gaussian smoothed image with σ varied from 0.8 to 1.2 in 0.1 steps. Further, a smoothed image containing no noise was used for learning, but an actual image usually contains compression noise and imaging noise. Therefore, the restoration accuracy when normal noise is added to the pixel value is also evaluated. The normal noise has an average of 0, and the standard deviation σ _N is 0.5, 1.0. Non-linear enhancer in 1-pass video super-resolution (reference [12]) and Wiener filter (reference [21]) are also used for restoration processing. The Wiener filter is shown in Appendix C.

非線形エンハンサは、エッジ検出のためのＤｏＧフィルタをσ=0として、レベルに関する非線形操作には、３乗関数を用いた。適応クリップは用いずに、クリップレベルを調整パラメータとして、エンハンサゲインγ、ＤｏＧフィルタσ_２とともに、学習用画像における平均ISNRが最大となるように最適化した。最適化には、滑降シンプレックス法(Nelder-Mead法)（参考文献［17］）を用いた。ウィーナーフィルタも、点拡がり関数をσ=1.0のガウシアン平滑化フィルタとして、学習用画像における平均ISNRが最大となるように真の画像とノイズ成分のパワースペクトルの強度比パラメータを決定した。表１は、ガウシアン平滑化画像(σ=1.0)に加えた正規ノイズσ_Ｎに対する復元結果であり、評価用画像（Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/）18枚の平均ISNR［dB］になる(括弧内は標準偏差)。 The nonlinear enhancer uses a DoG filter for edge detection with σ=0, and uses a cubic function for the nonlinear operation regarding the level. Without using an adaptive clip, the clip level was optimized with the enhancer gain γ and the DoG filter σ ₂ as adjustment parameters so that the average ISNR in the learning image was maximized. The downhill simplex method (Nelder-Mead method) (reference [17]) was used for the optimization. For the Wiener filter, the point spread function is a Gaussian smoothing filter with σ=1.0, and the intensity ratio parameters of the true image and the power spectrum of the noise component are determined so that the average ISNR in the training image is maximized. Table 1 shows the restoration result for the normal noise σ _N added to the Gaussian smoothed image (σ=1.0), and the evaluation image (Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/) The average ISNR [dB] of 18 images is shown (standard deviation in parentheses).

ウィーナーフィルタは、最小二乗の意味で最適であり、点拡がり関数が真であり、かつノイズのない場合に厳密に復元がなされるが、実際には、量子化ノイズであっても、復元結果に影響する。図１０は、ガウシアン平滑化（σ＝0.8〜1.2)に対する評価用画像（Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/）18枚の復元結果の平均ISNR［dB］である。画素値に加えた正規ノイズレベルσ_Ｎが0.5，1.0の結果も示している。エラーバーは標準偏差である。σ_Ｎ=0.0、かつガウシアン平滑化σ=1.0のとき、同図(ｃ)のウィーナーフィルタによる復元精度は最大となるが、ガウシアン平滑化σが変化すると、復元精度は低下する。ノイズレベルσ_Ｎが0.5，1.0と大きくなるに従い、復元精度が低下する。復元結果の変動も大きく、フラー、およびノイズに対する耐性は低いことがわかる。 The Wiener filter is optimal in the sense of least squares, and is strictly restored when the point spread function is true and there is no noise. Affect. Fig. 10 shows the average ISNR [dB] of 18 restored images (Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/) for evaluation against Gaussian smoothing (σ = 0.8 to 1.2). Is. The results of the normal noise level σ _N added to the pixel value being 0.5 and 1.0 are also shown. Error bars are standard deviations. When σ _N =0.0 and Gaussian smoothing σ=1.0, the restoration accuracy by the Wiener filter in FIG. 7C is maximum, but when Gaussian smoothing σ changes, the restoration accuracy decreases. The restoration accuracy decreases as the noise level σ _N increases to 0.5 and 1.0. It can be seen that the fluctuation of the restoration result is large and the resistance to fuller and noise is low.

同図(ａ)の畳み込みニューラルネットワークによる結果もσ_Ｎ=0.0、かつガウシアン平滑化σ=1.0のときに復元精度が最大となる。ブラー、およびノイズに対する結果もウィーナーフィルタとほぼ同様の傾向を示すが、ウィーナーフィルタと比較すると、耐性があり、変動も少ないことがわかる。一方、同図(b)の非線形エンハンサの結果は、十分な復元精度があるとは言い難い。しかし、復元精度は低いものの、ブラー、およびノイズに対する耐性は最も高いことがうかがえる。 The result obtained by the convolutional neural network in FIG. 9A also has the maximum restoration accuracy when σ _N =0.0 and Gaussian smoothing σ=1.0. The results for blur and noise show almost the same tendency as that of the Wiener filter, but it is found to be more resistant and less variable than the Wiener filter. On the other hand, it is hard to say that the result of the nonlinear enhancer in FIG. 6B has sufficient restoration accuracy. However, although the restoration accuracy is low, the resistance to blur and noise is highest.

畳み込みニューラルネットワークによるデブラー処理は、ウィーナーフィルタに匹敵する復元精度を有し、局所領域処理であるにも関わらず、ウィーナーフィルタの良い近似であり、フラー、およびノイズに対しては、よりロバストである。 The deblurring process by the convolutional neural network has a restoration accuracy comparable to that of the Wiener filter, and is a good approximation of the Wiener filter despite being the local region processing, and is more robust against the fuller and the noise. ..

（５まとめ）
4K/8K超高精細映像におけるフォーカスずれを補正することを目的として、畳込みニューラルネットワークによるデブラー処理を行った。畳み込みニューラルネットワークによるデブラー処理と１パスビデオ超解像における非線形エンハンサ処理との類似性を指摘するとともに、畳み込みニューラルネットッワークにおけるレベルに関する非線形操作を行うReLU活性化関数の処理結果をフーリエ級数展開することにより、高調波成分が発生していることを明らかにした。そして、畳み込みニューラルネットワークによるデブラー処理の復元性能、ノイズ耐性を評価した。さらに、１パスビデオ超解像における非線形エンハンサ処理、最小二乗の意味において最適なウィーナーフィルタによる結果と比較した。 (5 summary)
We performed deblurring with a convolutional neural network for the purpose of correcting the focus shift in 4K/8K ultra high definition images. We point out the similarity between the deblurring process by the convolutional neural network and the non-linear enhancer process in the 1-pass video super-resolution, and perform the Fourier series expansion of the processing result of the ReLU activation function that performs the non-linear operation on the level in the convolutional neural network. By doing so, it was clarified that a harmonic component was generated. Then, the restoration performance and noise resistance of the deblurring process by the convolutional neural network were evaluated. Furthermore, the results are compared with the results obtained by the optimal Wiener filter in the sense of nonlinear enhancer processing and least squares in 1-pass video super-resolution.

畳み込みニューラルネットワークによるデブラー処理は、１パスビデオ超解像における非線形エンハンサ処理同様、入力畳み込みフィルタにより検出されたエッジ成分をReLU活性化関数によるレベルに関する非線形操作を行うことから、高周波成分を復元するものであり、学習用画像における最小二乗の意味で最適な非線形エンハンサと言えるだろう。 The deblurring process by the convolutional neural network is to restore the high frequency component because the edge component detected by the input convolution filter is subjected to the non-linear operation with respect to the level by the ReLU activation function, like the non-linear enhancer process in the one-pass video super-resolution. Therefore, it can be said that it is an optimal nonlinear enhancer in the sense of least squares in the training image.

今後の課題としては、ネットワーク構成の最適化や深層化による復元性能やノイズ耐性の向上、学習を高速化させるためのＧＰＵ利用、そして、FPGA実装による4K/8K映像のリアルタイム処理が挙げられる。畳み込みニューラルネットワークは、4K/8K超高精細映像において、事前の学習による最適化の結果をリアルタイム処理するための、現在最も有効な手法であろう。 Future issues include optimization of network configuration and improvement of restoration performance and noise resistance by deepening layers, use of GPU to accelerate learning, and real-time processing of 4K/8K video by FPGA implementation. Convolutional neural networks are probably the most effective method at present for real-time processing of optimization results by prior learning in 4K/8K ultra-high definition video.

（参考文献）
［1］ S. Amari, A theory of adaptive pattern classifiers, IEEE Transactions on Electronic Computers, EC-16-3 (June 1967), 299-307.
［2］ J. Biemond, R. L. Lagendijk, and R. M.Mersereau, Iterative methods for image deblurring, Proceedings of the IEEE, 78-5 (May 1990), 856-883.
［3］ C. Dong, C.-C. Loy, K. He, and X. Tang, Learning a deep convolutional network for image superresolution, Proceedings of 13th European Conference on Computer Vision (ECCV2014), Part IV, Zurich, Switzerland, pp. 184-199 (September 2014).
［4］ D. Eigen, D. Krishnan, and R. Fergus, Restoring an image taken through a window covered with dirt or rain, IEEE International Conference on Computer Vision (ICCV2013), Sydney, Australia, pp. 633-640 (December 2013).
［5］ K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, 36-4 (April 1980), 193-202.
［6］船津良平,山下誉行,三谷公二,野尻裕司，スーパーハイビジョンカメラ用フォーカス補助信号,映像情報メディア学会誌, 65-4 (2011年4月), 531-539
［7］ V. Jain and H. S. Seung, Natural image denoising with convolutional networks, Proceedings of Advances in Neural Information Processing Systems 21 (NIPS2008), pp. 769-776 (2008).
［8］金谷健一，「これなら分かる応用数学教室：最小二乗法からウェーブレットまで」，共立出版, 2003年6月．
［9］ A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Proceedings of Advances in Neural Information Processing Systems 25 (NIPS2012), pp. 1106-1114 (2012).
［10］ Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation, 1-4 (December 1989), 541-551.
［11］ Y. LeCun, L. Bottou, G. Orr, and K. Miiller, Efficient BackProp, In G. Orr and K. Miiller (Eds), Neural Networks: Tricks of the trade, Springer, 1998.
［12］松永力，時間空間方向性補間とマルチスケール非線形エンハンサによる１パスビデオ超解像,第20回画像センシングシンポジウム(SSII2014)講演論文集,横浜(パシフィコ横浜), 2014年６月．
［13］ W. S. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics, 5-4 (December 1943), 115-133.
［14］ V. Nair and G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, Proceedings of the 27th International Conference on Machine Learning (ICML10), Haifa, Israel, pp. 807-814 (June 2010).
［15］日本放送協会, 「スーパーハイビジョン映像技術」特集号, ＮＨＫ技研Ｒ＆Ｄ,No.137,2013年１月
［16］ S. C. Park, M. K. Park, and M. G. Kang, Super-resolution image reconstruction: A technical overview, IEEE Signal Processing Magazine, 20-3 (May 2003), 21-36.
［17］ W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes 3rd Edition: The Art of Scientific Computing, Cambridge University Press, September 2007.
［18］ F. Rosenblatt, The Perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, 65-6 (1958), 386-408.
［19］ D. E. Rumelhart, J. L. McClelland, and CORPORATE PDP Research Group, Parallel Distributed Processing: Explorations in the Micro structure of Cognition, Vol. 1: Foundations, Vol. 2: Psychological and Biological Models, MIT Press, Cambridge, MA, USA, July 1986, July 1987. 甘利俊一（監訳），「ＰＤＰモデル−認知科学とニューロン回路網の探索」，産業図書，1989年３月
［20］ J. -L. Starck, and F. Murtagh, Astronomical Image and Data Analysis, Springer, 2006.
［21］ N. Wiener, Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications, John Wiley & Sons. Inc., New York, 1949.
［22］ L. Xu, J. S. Ren, C. Liu, and J. Jia, Deep convolutional neural network for image deconvolution, Proceedings of Advances in Neural Information Processing Systems 27 (NIPS2014), pp. 1790-1798 (2014).
［23］趙延軍,松永力, MXFファイルに対応した１パスビデオ超解像アップ変換のＧＰＵによる高速化，ＧＰＵテクノロジ・カンファレンス(ＧＴＣ Japan 2014), 2014-8008,東京(東京ミッドタウンホール＆カンファレンス), 2014年７月． (References)
[1] S. Amari, A theory of adaptive pattern classifiers, IEEE Transactions on Electronic Computers, EC-16-3 (June 1967), 299-307.
[2] J. Biemond, RL Lagendijk, and RMMersereau, Iterative methods for image deblurring, Proceedings of the IEEE, 78-5 (May 1990), 856-883.
[3] C. Dong, C.-C. Loy, K. He, and X. Tang, Learning a deep convolutional network for image superresolution, Proceedings of 13th European Conference on Computer Vision (ECCV2014), Part IV, Zurich, Switzerland , pp. 184-199 (September 2014).
[4] D. Eigen, D. Krishnan, and R. Fergus, Restoring an image taken through a window covered with dirt or rain, IEEE International Conference on Computer Vision (ICCV2013), Sydney, Australia, pp. 633-640 (December 2013).
[5] K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, 36-4 (April 1980), 193-202.
[6] Ryohei Funazu, Yoshiyuki Yamashita, Koji Mitani, Yuji Nojiri, Focus Auxiliary Signal for Super Hi-Vision Camera, Journal of Image Information Media, 65-4 (April 2011), 531-539
[7] V. Jain and HS Seung, Natural image denoising with convolutional networks, Proceedings of Advances in Neural Information Processing Systems 21 (NIPS2008), pp. 769-776 (2008).
[8] Ken'ichi Kanaya, “Applied Mathematics Classes Understood by This: From Least Squares to Wavelets”, Kyoritsu Shuppan, June 2003.
[9] A. Krizhevsky, I. Sutskever, and GE Hinton, ImageNet classification with deep convolutional neural networks, Proceedings of Advances in Neural Information Processing Systems 25 (NIPS2012), pp. 1106-1114 (2012).
[10] Y. LeCun, B. Boser, JS Denker, D. Henderson, RE Howard, W. Hubbard, and LD Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation, 1-4 (December 1989), 541- 551.
[11] Y. LeCun, L. Bottou, G. Orr, and K. Miiller, Efficient BackProp, In G. Orr and K. Miiller (Eds), Neural Networks: Tricks of the trade, Springer, 1998.
[12] Riki Matsunaga, 1-pass video super-resolution using spatiotemporal directional interpolation and multiscale nonlinear enhancer, Proc. of the 20th Image Sensing Symposium (SSII2014), Yokohama (Pacifico Yokohama), June 2014.
[13] WS McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics, 5-4 (December 1943), 115-133.
[14] V. Nair and GE Hinton, Rectified linear units improve restricted Boltzmann machines, Proceedings of the 27th International Conference on Machine Learning (ICML10), Haifa, Israel, pp. 807-814 (June 2010).
[15] Japan Broadcasting Corporation, Special Issue on "Super Hi-Vision Video Technology", NHK STRL R&D, No.137, January 2013 [16] SC Park, MK Park, and MG Kang, Super-resolution image reconstruction: A technical overview , IEEE Signal Processing Magazine, 20-3 (May 2003), 21-36.
[17] WH Press, SA Teukolsky, WT Vetterling, and BP Flannery, Numerical Recipes 3rd Edition: The Art of Scientific Computing, Cambridge University Press, September 2007.
[18] F. Rosenblatt, The Perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, 65-6 (1958), 386-408.
[19] DE Rumelhart, JL McClelland, and CORPORATE PDP Research Group, Parallel Distributed Processing: Explorations in the Micro structure of Cognition, Vol. 1: Foundations, Vol. 2: Psychological and Biological Models, MIT Press, Cambridge, MA, USA , July 1986, July 1987. Shunichi Amari (Translated), "PDP Model-Cognitive Science and Search for Neuron Networks", Industrial Book, March 1989 [20] J. -L. Starck, and F. Murtagh, Astronomical Image and Data Analysis, Springer, 2006.
[21] N. Wiener, Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications, John Wiley & Sons. Inc., New York, 1949.
[22] L. Xu, JS Ren, C. Liu, and J. Jia, Deep convolutional neural network for image deconvolution, Proceedings of Advances in Neural Information Processing Systems 27 (NIPS2014), pp. 1790-1798 (2014).
[23] Zhao Yanjun, Riki Matsunaga, Acceleration of 1-pass video super-resolution up conversion for MXF files by GPU, GPU Technology Conference (GTC Japan 2014), 2014-8008, Tokyo (Tokyo Midtown Hall & Conference) ), July 2014.

（付録Ａ畳み込みニューラルネットワークの勾配）
畳み込みニューラルネットワークの各層のパラメータの式（10）の目的関数Ｊに関する勾配は次のようになる。 (Appendix A Convolutional Neural Network Gradient)
The gradient of the objective function J of the equation (10) for each layer of the convolutional neural network is as follows.

ここで、活性化関数 Where the activation function

の微分は次のようになる．

The derivative of is as follows.

（付録ＢＲｅＬＵ活性化関数処理結果のフーリエ級数展開）
周期Ｔで繰り返される周期関数ｆ（ｔ）は、次の三角関数によるフーリエ級数に展開できる（参考文献［8］）。 (Appendix B Fourier series expansion of ReLU activation function processing results)
The periodic function f(t) repeated in the period T can be expanded into a Fourier series by the following trigonometric function (reference [8]).

ここで、角周波数ω＝２π／Ｔであり、右辺の係数ａ_ｋ，ｂ_ｋは次のようになる。 Here, the angular frequency ω=2π/T, and the coefficients a _k and b _{k on the} right side are as follows.

ReLU活性化関数（参考文献［5,14］）により半波整流された正弦波は、 The sine wave half-wave rectified by the ReLU activation function (references [5,14]) is

であり、そのフーリエ係数を求めると、 And when the Fourier coefficient is calculated,

ここで、図８は、ＲｅＬＵ活性化関数（参考文献［5,14］）により半波整流された正弦波を説明する図である。ＲｅＬＵ活性化関数により半波整流された正弦波には、偶数次の高調波成分が発生していることがわかる。 Here, FIG. 8 is a diagram illustrating a sine wave half-wave rectified by the ReLU activation function (reference document [5, 14]). It can be seen that even-order harmonic components are generated in the sine wave half-wave rectified by the ReLU activation function.

（付録Ｃウィーナーフィルタ）
観測画像g（x,y）は、真の画像 (Appendix C Wiener filter)
Observation image g(x,y) is a true image

が点拡がり関数h（x,y）により劣化したものに、ノイズ成分n（x,y）が加わったものとして、次のように表される。

Is deteriorated by the point spread function h(x,y), and the noise component n(x,y) is added as follows.

ここで、*は畳み込み演算である。フォーカスずれの場合、h（x,y）は２次元ガウシアン関数により近似する。
周波数領域では、 Here, * is a convolution operation. In the case of defocus, h(x,y) is approximated by a two-dimensional Gaussian function.
In the frequency domain,

であり、観測画像G（u,v）から，真の画像 And the true image from the observed image G (u,v)

を

To

のように推定するウィーナーフィルタW（u,v）は、 The Wiener filter W (u,v) that is estimated as

である。ここで、Ｈ^＊は、Ｈの複素共役を表す。 Is. Here, H ^* represents a complex conjugate of H.

であり、Ｓ_ｎ, Ｓ_ｆはそれぞれノイズ成分、真の画像のパワースペクトル密度である。
Ｋ（u,v)は、真の画像とノイズ成分から決まるものであり、近似的なものが既知の場合もあるが、通常は経験的な定数として指定する調整パラメータである。Ｋ＝０とすると、W（u,v)=１／H（u,v）であり、逆フィルタになる。大きなu，vに対して、 _Where S _n and S _f are the noise component and the power spectral density of the true image, respectively.
K(u,v) is determined from the true image and the noise component, and an approximate one may be known, but it is usually an adjustment parameter specified as an empirical constant. When K=0, W(u,v)=1/H(u,v), which is an inverse filter. For large u and v,

とすると、高周波成分が抑制される。ウィーナーフィルタは、真の画像との二乗誤差

Then, the high frequency component is suppressed. Wiener filter is the squared error from the true image

を最小にするフィルタである（参考文献［21］）。 Is a filter that minimizes (reference [21]).

本発明は、４Ｋ／８Ｋ（スーパーハイビジョン）超高精細映像にも好適である。 The present invention is also suitable for 4K/8K (Super Hi-Vision) ultra high definition images.

Claims

In the focus correction processing method by the learning type algorithm,
In order to use the convolutional neural network that is the learning algorithm for deblurring restoration processing,
In order to generate a smoothed input image assuming defocus from the true image, such input/output images are used as learning data, and the restored image obtained by processing the smoothed input image by the network and the true image. In the step of estimating the parameters of the network such that the sum of squared differences is minimized, the learning parameter is determined so that the average ISNR of deblurring restoration results in another evaluation image different from the learning image is maximized. And the process,
And a focus correction process using the convolutional neural network based on the determined learned parameters.