WO2018043437A1

WO2018043437A1 - Image distance calculation device, and computer-readable non-transitory recording medium with an image distance calculation program recorded thereon

Info

Publication number: WO2018043437A1
Application number: PCT/JP2017/030813
Authority: WO
Inventors: 嶐一岡
Original assignee: University of Aizu
Current assignee: University of Aizu
Priority date: 2016-09-01
Filing date: 2017-08-28
Publication date: 2018-03-08
Anticipated expiration: 2019-03-01

Abstract

In an image distance calculation device (100), a CPU (104) extracts a frame image from a camera moving-image, generates a slice image by a temporal change of a pixel array on a y-axis at an x0 point of the frame image, calculates a spotting point on the basis of a corresponding relationship between a pixel of the slice image and a pixel of the frame image, finds a pixel of the frame image corresponding to a pixel of the slice image by a backtrace process, performs area division of the frame image and the slice image, determines a corresponding area of the frame image that corresponds to a divided area of the slice image, calculates a ratio value using an average q of the pixel number of the corresponding areas of the frame image and an average p of the pixel number of the divided areas of the slice image, and calculates a distance z from the camera to an object to be photographed using a predetermined distance function in each corresponding area.

Description

Image distance calculation apparatus and computer-readable non-transitory recording medium storing image distance calculation program

　本発明は、画像距離算出装置および画像距離算出用プログラムを記録したコンピュータ読み取り可能な非一過性の記録媒体に関する。 The present invention relates to an image distance calculation device and a computer-readable non-transitory recording medium in which an image distance calculation program is recorded.

　従来より、同時に撮影された２つの画像に基づいてカメラ位置から撮影対象物までの距離を、視差を利用して算出するステレオビジョン法と呼ばれる方法が知られている（例えば、特許文献１、特許文献２参照）。ステレオビジョン法では、２つのカメラの左右方向の距離ｄを一定に保った状態で、２台のカメラを用いて同じ撮影対象物の撮影を同時に行う。２つのカメラによって撮影されたそれぞれの画像は、カメラ間の距離ｄだけ異なった位置で撮影されたものであるため、撮影対象物の撮影画像が微妙に異なったものになる。２枚の撮影画像の違いは、距離ｄに基づく視差の影響によるものである。従って、２枚の画像に写っている撮影対象物を比較し、左右方向における画素位置（ピクセル位置）の違いを、視差として求めることによって、次の式に基づいて、撮影対象物までの距離を算出することができる。 2. Description of the Related Art Conventionally, a method called a stereo vision method is known in which a distance from a camera position to an object to be photographed is calculated using parallax based on two images taken at the same time (for example, Patent Document 1, Patent) Reference 2). In the stereo vision method, the same object to be photographed is simultaneously performed using two cameras while the distance d in the left-right direction between the two cameras is kept constant. Since the images taken by the two cameras are taken at different positions by the distance d between the cameras, the taken images of the subject to be photographed are slightly different. The difference between the two captured images is due to the influence of parallax based on the distance d. Therefore, by comparing the object to be photographed in the two images and obtaining the difference in pixel position (pixel position) in the left-right direction as the parallax, the distance to the object to be photographed is calculated based on the following equation: Can be calculated.

　撮影対象物までの距離＝（カメラの焦点距離×カメラ間距離ｄ）÷視差（左右方向の画素（ピクセル）差）
　カメラにより撮影された動画映像に基づいて、撮影対象物までの距離を求める場合も同様である。２台のカメラで撮影された動画映像より、同じタイミング（同じ時間）で撮影された１対のフレーム画像を抽出し、抽出された１対のフレーム画像に基づいて視差（左右方向の画素（ピクセル）差）を求める。そして、カメラ間（２つのフレーム画像をとらえたそれぞれのカメラの位置）の距離ｄと視差とを、上述した式に適用することによって、撮影対象物までの距離を、撮影時間毎に算出することができる。 Distance to shooting object = (focal length of camera × distance between cameras d) ÷ parallax (pixel (pixel) difference in the horizontal direction)
The same applies to the case of obtaining the distance to the object to be photographed based on the moving image photographed by the camera. A pair of frame images captured at the same timing (same time) is extracted from the moving image captured by the two cameras, and parallax (pixels in the horizontal direction (pixels) is extracted based on the extracted pair of frame images. ) Difference). Then, by applying the distance d between the cameras (the positions of the respective cameras capturing the two frame images) and the parallax to the above-described formula, the distance to the shooting target is calculated for each shooting time. Can do.

特開２００８－３０９５１９号公報JP 2008-309519 A 特開２００９－１３９９９５号公報JP 2009-139995 A

　しかしながら、上述したような２つの画像の視差を利用して撮影対象物までの距離を算出する方法では、２つの画像における撮影対象物の画素差（ピクセル差）を求める必要がある。つまり、２つの画像における同一撮影対象物の対応関係を画素レベルで画素毎に求める必要があり、その違いを画素差（ピクセル差）として明確にする必要がある。しかしながら、２つの画像における画素（ピクセル）毎の対応関係を求めることは、容易ではない。具体的には、２つの撮影画像における同一撮影対象物のマッチングおよび画素特定を行う必要がある。このようなマッチングおよび画素特定を実現するためには、様々な画像処理技術を利用・応用する必要があった。 However, in the method of calculating the distance to the shooting target using the parallax between the two images as described above, it is necessary to obtain a pixel difference (pixel difference) between the shooting targets in the two images. That is, it is necessary to obtain the correspondence relationship between the same photographing objects in two images for each pixel at the pixel level, and it is necessary to clarify the difference as a pixel difference (pixel difference). However, it is not easy to obtain a correspondence relationship for each pixel (pixel) in two images. Specifically, it is necessary to perform matching and pixel specification of the same object to be photographed in two photographed images. In order to realize such matching and pixel specification, it is necessary to use and apply various image processing techniques.

　また、撮影された２つの画像を比較すると、遠くの撮影対象物は撮影画像間の画素差（ピクセル差）が小さくなり、近くの撮影対象物では画素差（ピクセル差）が大きくなる。しかしながら、２つのカメラ間の距離が人間の左右の目の間隔程度の場合には、遠くの画素差（ピクセル差）と近くの画素差（ピクセル差）との差が、数画素程度（例えば、遠くで１ピクセル差、近くで４ピクセル差程度）しか生じない。このため、遠くと近くとの距離の算出精度が、４段階程度の違いしか求められなくなってしまい、距離の算出精度を十分に確保することが難しいという問題があった。 In addition, when comparing two captured images, the pixel difference (pixel difference) between captured images is small for a far object to be photographed, and the pixel difference (pixel difference) is large for a nearby object to be photographed. However, when the distance between the two cameras is about the distance between the left and right eyes of a human, the difference between the far pixel difference (pixel difference) and the near pixel difference (pixel difference) is about several pixels (for example, Only 1 pixel difference at a distance and 4 pixel difference at a distance). For this reason, there is a problem that it is difficult to ensure sufficient distance calculation accuracy because the calculation accuracy of the distance between the distance and the distance can only be obtained in about four steps.

　また、カメラ間の距離ｄを長くすることによって、画素差（ピクセル差）を大きくすることもできるが、２台のカメラで同時に同一の撮影対象物を撮影することを考慮すると、カメラ間距離ｄにある程度制約が生じてしまい、長大な距離を確保することが難しいという問題があった。また、カメラ間の距離ｄが広がると、異なる２つの画像における同一対象物の位置や形が画像上で変化するため、同一対象物の画素レベルでのマッチングが困難になるという問題があった。カメラ間の距離ｄを長くすることは、ステレオビジョンにおいて長い間、解決すべき課題とされてきた。この解決が困難のため、現在では、１つの対象に対して、ステレオカメラによる撮影を、数十回から数万回行っている。 In addition, the pixel difference (pixel difference) can be increased by increasing the distance d between the cameras. However, considering that the same object is simultaneously captured by two cameras, the inter-camera distance d is considered. However, there is a problem that it is difficult to secure a long distance. Further, when the distance d between the cameras increases, the position and shape of the same object in two different images change on the image, which makes it difficult to match the same object at the pixel level. Increasing the distance d between cameras has long been a problem to be solved in stereo vision. Since this solution is difficult, at present, photographing with a stereo camera is performed several tens to tens of thousands of times for one target.

　さらに、２台のカメラを用いて同一撮影対象物を撮影する必要があるため、１台のカメラを用いて撮影を行う一般的な撮影条件に比べて、様々な制約が生じることになり、撮影負担が重いという問題があった。 Furthermore, since it is necessary to photograph the same object using two cameras, various restrictions arise compared to general photographing conditions for photographing using one camera. There was a problem that the burden was heavy.

　本発明は、上記問題に鑑みてなされたものであり、カメラから映像に写された撮影対象物までの距離を算出する画像距離算出装置および画像距離算出用プログラムを記録したコンピュータ読み取り可能な非一過性の記録媒体を提供することを課題とする。 The present invention has been made in view of the above problems, and is an image distance calculation device that calculates a distance from a camera to a subject to be imaged in a video, and a computer-readable non-one recording an image distance calculation program. It is an object to provide a temporary recording medium.

　上記課題を解決するために、本発明の一実施形態に係る画像距離算出用プログラムを記録したコンピュータ読み取り可能な非一過性の記録媒体は、移動する１台のカメラにより撮影された動画映像に基づいて、カメラから動画映像に記録された撮影対象物までの距離を算出する画像距離算出装置の画像距離算出用プログラムを記録したコンピュータ読み取り可能な非一過性の記録媒体であって、前記画像距離算出装置の制御部に、前記動画映像の任意の時間におけるフレーム画像を抽出させるフレーム画像抽出機能と、該フレーム画像において、前記カメラの移動方向に向かって延設される軸をｘ軸とし、当該ｘ軸に直交する軸をｙ軸として、前記ｘ軸のｘ０点におけるｙ軸上の画素列の時間変化を、時間ｔ０＋１から時間ｔ０＋Ｔまで抽出することにより、縦軸を前記ｙ軸、横軸をｔ軸（１≦ｔ≦Ｔ）とするスライス画像を生成させるスライス画像生成機能と、時間ｔ（１≦ｔ≦Ｔ）における前記スライス画像の画素をｇ（ｔ，ｙ）とし、前記フレーム画像のｙ軸上のｙ′点（１≦ｙ′≦Ｙ）における時間ｔ０のときのｘｙｔ空間の画素をｆ（ｘ，ｙ′，ｔ０）＝ｒ（ｘ）として、ｘの区間［１，Ｘ］の任意の点において存在する、スライス画像の画素ｇ（ｔ，ｙ）に対応するフレーム画像の画素ｒ（ｘ）点を、動的計画法に基づくマッチング処理を用いて求めることにより、前記スライス画像における時間Ｔの画素に対応する前記フレーム画像の画素の座標を、スポッティング点として算出させるスポッティング点算出機能と、該スポッティング点算出機能により算出されたスポッティング点に基づいて、時間ｔ＝Ｔから時間ｔ＝１までバックトレース処理を行うことにより、前記スライス画像のｔ軸におけるｔ＝１からｔ＝Ｔまでのそれぞれの画素に対応する前記フレーム画像の画素の対応関係を求めさせる画素マッチング機能と、前記フレーム画像と前記スライス画像とのそれぞれの画像に対して、mean-shift法を適用することにより、共通した分割基準に基づいて、それぞれの画像の領域分割を行わせる領域分割機能と、該領域分割機能により分割された前記スライス画像の分割領域内に存在する画素に基づいて、前記画素マッチング機能により求められた前記スライス画像の画素に対応する前記フレーム画像の画素を検出し、検出された前記フレーム画像の画素が最も多く含まれる前記フレーム画像の分割領域を求めることによって、前記スライス画像の分割領域に対応する前記フレーム画像の分割領域を対応領域として決定させる対応領域決定機能と、該対応領域決定機能によって決定された前記フレーム画像の対応領域において、前記ｘ軸方向の画素数の平均ｑを検出すると共に、対応する前記スライス画像の分割領域において、前記ｔ軸方向の画素数の平均ｐを検出することによって、ｐに対するｑの割合あるいはｑに対するｐの割合に基づいて求められる比率値を、前記対応領域毎に算出させ、前記カメラから前記フレーム画像に写っている撮影対象物までの距離と前記比率値との対応関係が予め定められた距離関数を用いることによって、算出された前記比率値に対応する前記距離を、グローバル距離として前記対応領域毎に算出させるグローバル距離算出機能とを実現させるための画像距離算出用プログラムを記録したコンピュータ読み取り可能な非一過性の記録媒体であることを特徴とする。 In order to solve the above problems, a computer-readable non-transitory recording medium that records an image distance calculation program according to an embodiment of the present invention is a moving image captured by a single moving camera. A computer-readable non-transitory recording medium in which an image distance calculation program of an image distance calculation device that calculates a distance from a camera to a shooting target recorded in a moving image is based on the image A frame image extraction function for causing the control unit of the distance calculation apparatus to extract a frame image at an arbitrary time of the moving image, and an axis extending in the moving direction of the camera in the frame image is an x axis, Using the axis orthogonal to the x axis as the y axis, the time change of the pixel column on the y axis at the x0 point of the x axis is extracted from the time t0 + 1 to the time t0 + T. A slice image generation function for generating a slice image having the y axis as the vertical axis and the t axis (1 ≦ t ≦ T) as the horizontal axis, and the slice image at time t (1 ≦ t ≦ T). The pixel is g (t, y), and the pixel in the xyt space at time t0 at the y ′ point (1 ≦ y ′ ≦ Y) on the y axis of the frame image is f (x, y ′, t0) =. A dynamic programming method uses a pixel r (x) point of a frame image corresponding to a pixel g (t, y) of a slice image, which exists at an arbitrary point in an interval [1, X] of x as r (x). Is calculated using the spotting point calculation function for calculating the coordinates of the pixel of the frame image corresponding to the pixel at time T in the slice image as a spotting point, and the spotting point calculation function. The Based on the potting point, by performing backtrace processing from time t = T to time t = 1, the frame image corresponding to each pixel from t = 1 to t = T on the t-axis of the slice image. By applying the pixel matching function for determining the correspondence between pixels and the mean-shift method for each of the frame image and the slice image, each image is based on a common division criterion. An area dividing function for performing area division, and the pixels corresponding to the pixels of the slice image obtained by the pixel matching function based on pixels existing in the divided area of the slice image divided by the area dividing function Detecting a frame image pixel and dividing the frame image including the largest number of detected frame image pixels In the corresponding area determination function for determining the corresponding divided area of the frame image corresponding to the divided area of the slice image as the corresponding area, and the corresponding area of the frame image determined by the corresponding area determining function, the x By detecting the average q of the number of pixels in the axial direction and detecting the average p of the number of pixels in the t-axis direction in the corresponding divided region of the slice image, the ratio of q to p or the ratio of p to q The ratio value calculated based on the above is calculated for each corresponding region, and a distance function in which the correspondence between the distance from the camera to the object to be photographed in the frame image and the ratio value is determined in advance is used. Accordingly, the distance corresponding to the calculated ratio value is calculated as a global distance for each corresponding area. The computer-readable non-transitory recording medium which recorded the image distance calculation program for implement | achieving a global distance calculation function is characterized by the above-mentioned.

　また、上記課題を解決するために、本発明の一実施形態に係る画像距離算出装置は、移動する１台のカメラにより撮影された動画映像に基づいて、当該動画映像の任意の時間におけるフレーム画像を抽出するフレーム画像抽出部と、該フレーム画像において、前記カメラの移動方向に向かって延設される軸をｘ軸とし、当該ｘ軸に直交する軸をｙ軸として、前記ｘ軸のｘ０点におけるｙ軸上の画素列の時間変化を、時間ｔ０＋１から時間ｔ０＋Ｔまで抽出することにより、縦軸を前記ｙ軸、横軸をｔ軸（１≦ｔ≦Ｔ）とするスライス画像を生成するスライス画像生成部と、時間ｔ（１≦ｔ≦Ｔ）における前記スライス画像の画素をｇ（ｔ，ｙ）とし、前記フレーム画像のｙ軸上のｙ′点（１≦ｙ′≦Ｙ）における時間ｔ０のときのｘｙｔ空間の画素をｆ（ｘ，ｙ′，ｔ０）＝ｒ（ｘ）として、ｘの区間［１，Ｘ］の任意の点において存在する、スライス画像の画素ｇ（ｔ，ｙ）に対応するフレーム画像の画素ｒ（ｘ）点を、動的計画法に基づくマッチング処理を用いて求めることにより、前記スライス画像における時間Ｔの画素に対応する前記フレーム画像の画素の座標を、スポッティング点として算出するスポッティング点算出部と、該スポッティング点算出部により算出されたスポッティング点に基づいて、時間ｔ＝Ｔから時間ｔ＝１までバックトレース処理を行うことにより、前記スライス画像のｔ軸におけるｔ＝１からｔ＝Ｔまでのそれぞれの画素に対応する前記フレーム画像の画素の対応関係を求める画素マッチング部と、前記フレーム画像と前記スライス画像とのそれぞれの画像に対して、mean-shift法を適用することにより、共通した分割基準に基づいて、それぞれの画像の領域分割を行う領域分割部と、該領域分割部により分割された前記スライス画像の分割領域内に存在する画素に基づいて、前記画素マッチング部により求められた前記スライス画像の画素に対応する前記フレーム画像の画素を検出し、検出された前記フレーム画像の画素が最も多く含まれる前記フレーム画像の分割領域を求めることによって、前記スライス画像の分割領域に対応する前記フレーム画像の分割領域を対応領域として決定する対応領域決定部と、該対応領域決定部により決定された前記フレーム画像の対応領域において、前記ｘ軸方向の画素数の平均ｑを検出すると共に、対応する前記スライス画像の分割領域において、前記ｔ軸方向の画素数の平均ｐを検出することによって、ｐに対するｑの割合あるいはｑに対するｐの割合に基づいて求められる比率値（移動するカメラによる累積モーションパララックスの領域毎の典型特徴量）を、前記対応領域毎に算出し、前記カメラから前記フレーム画像に写っている撮影対象物までの距離と前記比率値との対応関係が予め定められた距離関数を用いることによって、算出された前記比率値に対応する前記距離を、グローバル距離として前記対応領域毎に算出するグローバル距離算出部とを有することを特徴とする。 In order to solve the above problem, an image distance calculation device according to an embodiment of the present invention is based on a moving image captured by a single moving camera, and a frame image at an arbitrary time of the moving image. A frame image extraction unit for extracting the x axis, an axis extending in the moving direction of the camera in the frame image as an x axis, and an axis perpendicular to the x axis as a y axis, and x0 point of the x axis Is a slice that generates a slice image with the vertical axis representing the y-axis and the horizontal axis representing the t-axis (1 ≦ t ≦ T) by extracting the time change of the pixel column on the y-axis from time t0 + 1 to time t0 + T. Let the image generation unit have g (t, y) as the pixel of the slice image at time t (1 ≦ t ≦ T), and time at y ′ point (1 ≦ y ′ ≦ Y) on the y-axis of the frame image xyt sky at t0 The frame image corresponding to the pixel g (t, y) of the slice image existing at an arbitrary point in the interval [1, X] of x, where f (x, y ′, t0) = r (x) Spot r for calculating a pixel coordinate of the frame image corresponding to a pixel at time T in the slice image by obtaining a matching point based on dynamic programming. Based on the point calculation unit and the spotting points calculated by the spotting point calculation unit, backtrace processing is performed from time t = T to time t = 1, so that t = 1 to t on the t-axis of the slice image A pixel matching unit that obtains a correspondence relationship between the pixels of the frame image corresponding to each pixel up to = T, and the relationship between the frame image and the slice image. By applying a mean-shift method to each image, based on a common division criterion, a region dividing unit that performs region division of each image, and the slice image divided by the region dividing unit The pixel of the frame image corresponding to the pixel of the slice image obtained by the pixel matching unit is detected on the basis of the pixels existing in the divided region, and the detected number of pixels of the frame image is the largest. By determining a divided area of the frame image, a corresponding area determining unit that determines a divided area of the frame image corresponding to the divided area of the slice image as a corresponding area, and the frame image determined by the corresponding area determining unit In the corresponding region, the average q of the number of pixels in the x-axis direction is detected, and in the corresponding divided region of the slice image, By detecting the average p of the number of pixels in the t-axis direction, a ratio value obtained based on the ratio of q to p or the ratio of p to q (typical feature value for each region of accumulated motion parallax by a moving camera) ) Is calculated for each corresponding region, and the correspondence between the distance from the camera to the subject to be photographed in the frame image and the ratio value is calculated using a predetermined distance function. And a global distance calculating unit that calculates the distance corresponding to the ratio value as a global distance for each of the corresponding areas.

　本発明の一実施形態に係る画像距離算出装置および画像距離算出用プログラムを記録したコンピュータ読み取り可能な非一過性の記録媒体によれば、フレーム画像の分割領域毎に、カメラから撮影対象物までの距離を求めることが可能になる。特に、本発明の一実施形態に係る画像距離算出装置および画像距離算出用プログラムを記録したコンピュータ読み取り可能な非一過性の記録媒体では、１台のカメラで撮影された動画映像に基づいて、フレーム画像の分割領域毎または画素毎の距離を算出することが可能になる。このため、従来のステレオビジョン法のように、カメラ間の間隔ｄを一定に維持した２台のカメラを複数回用いて撮影する場合に比べて、撮影機器の簡素化と撮影負担の低減を図ることが可能になる。 According to an image distance calculation device and a computer-readable non-transitory recording medium in which an image distance calculation program is recorded according to an embodiment of the present invention, from a camera to a photographing object for each divided region of a frame image Can be obtained. In particular, in a computer-readable non-transitory recording medium in which an image distance calculation device and an image distance calculation program according to an embodiment of the present invention are recorded, based on a moving image captured by one camera, It is possible to calculate the distance for each divided region or each pixel of the frame image. For this reason, as compared with a case where two cameras having a constant distance d between cameras are used for shooting a plurality of times as in the conventional stereo vision method, the photographing apparatus is simplified and the photographing burden is reduced. It becomes possible.

　また１台のカメラによって撮影された動画映像に基づいて、フレーム画像内の対応する領域または画素の距離を算出することができるため、例えば、過去に撮影された動画映像や、他の目的で撮影が行われた動画映像等のような、様々な動画映像に基づいて、その動画映像が映された撮影状況・撮影環境における撮影対象物までの距離を算出することが可能になる。 In addition, since the distance of the corresponding region or pixel in the frame image can be calculated based on the moving image taken by one camera, for example, the moving image taken in the past or taken for other purposes It is possible to calculate the distance to the shooting target in the shooting situation / shooting environment in which the moving image is projected based on various moving images such as the moving image that has been performed.

実施の形態に係る画像距離算出装置の概略構成を示したブロック図である。It is the block diagram which showed schematic structure of the image distance calculation apparatus which concerns on embodiment. 実施の形態に係る画像距離算出装置の処理内容を示したフローチャートである。It is the flowchart which showed the processing content of the image distance calculation apparatus which concerns on embodiment. 動的視差（モーションパララックス）におけるカメラと撮影対象物との関係を模式的に示した図である。It is the figure which showed typically the relationship between the camera and imaging | photography subject in dynamic parallax (motion parallax). 動画映像を３次元的な空間として説明するための図である。It is a figure for demonstrating a moving image as a three-dimensional space. （ａ）はフレーム画像の一例を示し、（ｂ）はスライス画像の一例を示した図である。(A) shows an example of a frame image, and (b) is a diagram showing an example of a slice image. 実施の形態に係る累積視差法と、従来のステレオビジョン法と、従来のEpipolar-Plane-Image法との違いを示した表である。6 is a table showing differences between a cumulative parallax method according to an embodiment, a conventional stereo vision method, and a conventional epipolar-plane-image method. スライス画像の画素に対応するフレーム画像の画素の位置を、黒丸を用いて模式的に示した図である。It is the figure which showed typically the position of the pixel of the frame image corresponding to the pixel of a slice image using the black circle. （ａ）は、水平方向に移動するカメラによって、撮影された動画映像において、時間ｔ＝１のフレーム画像を示した図である。（ｂ）は、（ａ）に示したｘ０の点（線）において、時間ｔ＝１からｔ＝１７５までの映像に基づいて作成されたスライス画像の例を示した図である。(A) is the figure which showed the frame image of time t = 1 in the moving image image | video image | photographed with the camera which moves to a horizontal direction. (B) is a diagram showing an example of a slice image created based on the video from time t = 1 to t = 175 at the point (line) of x0 shown in (a). 所定のｙ′における、フレーム画像の画素のｘと時間ｔとの関係と、スライス画像の画素ｇ（ｔ，ｙ）におけるｙと時間ｔとの関係とを模式的に示しつつ、フレーム画像とスライス画像との対応関係を示した図である。A frame image and a slice while schematically showing a relationship between x and a time t of a pixel of a frame image at a predetermined y ′ and a relationship between y and a time t of a pixel g (t, y) of the slice image It is the figure which showed the corresponding relationship with an image. ライン対画像のＤＰマッチング法のアルゴリズム（スポッティング機能を備えたＤＰマッチングアルゴリズム）を説明するための図である。It is a figure for demonstrating the algorithm (DP matching algorithm provided with the spotting function) of the DP matching method of a line pair image. （ａ）は、フレーム画像に、mean-shift法を適用した後の画像を示した図である。（ｂ）は、スライス画像に、mean-shift法を適用した後の画像を示した図である。(A) is the figure which showed the image after applying a mean-shift method to a frame image. (B) is the figure which showed the image after applying a mean-shift method to a slice image. スライス画像とフレーム画像との領域対応を説明するための模式図である。It is a schematic diagram for demonstrating area | region correspondence of a slice image and a frame image. （ａ）は、スライス画像の領域と、スライス画像の領域中のピクセルに最も数多く対応する対応点をもつフレーム画像の領域とにより、スライス画像とフレーム画像との領域対応が決定されることを説明するための模式図である。（ｂ）は、スライス画像の領域の横軸区間長の平均をｑとし、それに対応するフレーム画像の領域の横軸区間長の平均をｐとする場合に、α_ｒ＝ｑ／ｐとして、距離ｚとα_ｒとの関係を示した図である。(A) explains that the area correspondence between the slice image and the frame image is determined by the area of the slice image and the area of the frame image having the most corresponding points corresponding to the pixels in the area of the slice image. It is a schematic diagram for doing. (B) is the distance when α _r = q / p, where q is the average of the horizontal axis lengths of the slice image areas and p is the average of the horizontal axis lengths of the corresponding frame image areas. It is the figure which showed the relationship between z and (alpha) _r . キャリブレーションデータに基づいて算出された、分割領域毎のグローバル距離を示した画像である。It is the image which showed the global distance for every division area computed based on calibration data. （ａ）～（ｈ）は、複数枚のスライス画像を用いて順番にフレーム画像の領域毎の距離データを算出する過程を説明するための図である。(A)-(h) is a figure for demonstrating the process of calculating the distance data for every area | region of a frame image in order using a several slice image. （ａ）は、図１５（ａ）～（ｈ）に基づいて得られた複数の画像（距離画像系列）に対してモザイキング処理が行われた画像を示した図であり、（ｂ）は、（ａ）に示した画像に基づいて、領域毎にグローバル距離が算出された状態の３Ｄ画像に、画素のＲＧＢの値を付加し、異なった視点を基準として示した図である。(A) is a diagram showing an image obtained by performing mosaicing processing on a plurality of images (distance image series) obtained based on FIGS. 15 (a) to (h), and (b) It is the figure which added the RGB value of the pixel to the 3D image in the state where the global distance was calculated for each region based on the image shown in (a), and showed it based on different viewpoints. 第１の貼り合わせ処理の内容を示したフローチャートである。It is the flowchart which showed the content of the 1st bonding process. ２枚のフレーム画像における全ての画素のＲＧＢ情報を、ＲＧＢ空間に割り当てた状態を示した図である。It is the figure which showed the state which allocated the RGB information of all the pixels in two frame images to RGB space. 一部の画素のＲＧＢ情報の値が、コードのＲＧＢ情報の値に入れ替えられた一のフレーム画像を示した図である。It is the figure which showed one frame image by which the value of the RGB information of some pixels was replaced with the value of the RGB information of a code | cord | chord. 一部の画素のＲＧＢ情報の値が、コードのＲＧＢ情報の値に入れ替えられた他のフレーム画像を示した図である。It is the figure which showed the other frame image by which the value of the RGB information of some pixels was replaced with the value of the RGB information of a code | cord | chord. 貼り合わせ画像に対してmean-shift法を適用した図である。It is the figure which applied the mean-shift method with respect to the stitched image. 第２の貼り合わせ処理の内容を示したフローチャートである。It is the flowchart which showed the content of the 2nd bonding process. スライス画像の横軸上の複数の画素と、対応するフレーム画像の横軸上の複数の画素との対応関係を示した図である。It is the figure which showed the correspondence of the some pixel on the horizontal axis of a slice image, and the some pixel on the horizontal axis of a corresponding frame image. 横方向に近接するスライス画像の各画素に対するフレーム画像上の動的視差と、各動的視差を累積することにより求められる累積された動的視差との関係を説明するための図である。It is a figure for demonstrating the relationship between the dynamic parallax on the frame image with respect to each pixel of the slice image which adjoins the horizontal direction, and the accumulated dynamic parallax calculated | required by accumulating each dynamic parallax. 累積された動的視差が実際の距離に対応するかの計算式導出を示すモデルの図である。It is a figure of the model which shows calculation formula derivation | leading-out whether the accumulated dynamic parallax respond | corresponds to an actual distance. スライス画像の横軸方向の平均長と、フレーム画像の対応する横軸方向の平均長とを用いて、領域の距離ｚ_region（ｒ）を算出する方法を説明するための図である。It is a figure for demonstrating the method of calculating the area _{| region} distance _zregion (r) using the average length of the horizontal axis direction of a slice image, and the average length of the corresponding horizontal axis direction of a frame image. （ａ）は、カメラの移動速度が遅い場合における、α_ｒの変動パラメータμ_１およびγ_１と距離ｚとの関係を示した図である。（ｂ）は、カメラの移動速度が速い場合における、α_ｒの変動パラメータμ_１およびγ_１と距離ｚとの関係を示した図である。(A) is the figure which showed the relationship between the fluctuation parameters (micro | micron | mu) ₁ of (alpha) _r and (gamma) _1, and the distance z in case the moving speed of a camera is slow. (B) is a diagram showing the relationship between α _r variation parameters μ ₁ and γ ₁ and distance z when the moving speed of the camera is fast. 領域内のｉ番目の画素ｘ（ｉ）と、画素ｘ（ｉ）における詳細な距離ｚ（ｉ）との関係を示した図である。It is the figure which showed the relationship between the i-th pixel x (i) in an area | region, and the detailed distance z (i) in pixel x (i).

　以下、本発明の一実施形態に係る画像距離算出装置に関して、一例を示し、図面を用いて詳細に説明する。図１は、画像距離算出装置の概略構成を示したブロック図である。画像距離算出装置１００は、記録部（画素情報記録部）１０１と、ＲＯＭ（Read Only Memory）１０２と、ＲＡＭ（Random Access Memory：画素情報記録部）１０３と、ＣＰＵ(Central Processing Unit：フレーム画像抽出部、スライス画像生成部、スポッティング点算出部、画素マッチング部、領域分割部、対応領域決定部、グローバル距離算出部、ローカル距離算出部、詳細距離算出部、制御部、コード検出部、画素距離値抽出部、コードＲＧＢ値割当部、ＲＧＢ値入替部、貼り合わせ画像生成部、ＲＧＢ値検出部、距離情報付加部、ＲＧＢ値変更部、修正貼り合わせ画像生成部、距離付加貼り合わせ画像生成部）１０４とを有している。画像距離算出装置１００には、カメラ２００が接続されている。カメラ２００によって撮影された動画映像は、記録部１０１に記録される。また、画像距離算出装置１００には、モニタ２１０が接続されている。モニタ２１０には、カメラ２００によって撮影された動画映像や、後述する図１４、図１６（ａ）（ｂ）、図１９、図２０、図２１等の画像を表示させることが可能になっている。 Hereinafter, an example of an image distance calculation apparatus according to an embodiment of the present invention will be described and described in detail with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of an image distance calculation apparatus. The image distance calculation apparatus 100 includes a recording unit (pixel information recording unit) 101, a ROM (Read Memory) 102, a RAM (Random Access Memory: pixel information recording unit) 103, and a CPU (Central Processing Unit: frame image extraction). Unit, slice image generation unit, spotting point calculation unit, pixel matching unit, region division unit, corresponding region determination unit, global distance calculation unit, local distance calculation unit, detailed distance calculation unit, control unit, code detection unit, pixel distance value Extraction unit, code RGB value assignment unit, RGB value replacement unit, composite image generation unit, RGB value detection unit, distance information addition unit, RGB value change unit, modified composite image generation unit, distance addition composite image generation unit) 104. A camera 200 is connected to the image distance calculation apparatus 100. A moving image shot by the camera 200 is recorded in the recording unit 101. In addition, a monitor 210 is connected to the image distance calculation apparatus 100. On the monitor 210, it is possible to display a moving image shot by the camera 200 and images such as FIGS. 14, 16A and 16B, FIG. 19, FIG. .

　記録部１０１には、カメラ２００により撮影された動画映像が記録されている。より詳細には、カメラ２００により撮影された動画映像が、複数のフレーム画像を時系列的に記録したデータとして記録されている。例えば、時間１からＴまでの動画映像を、カメラ２００で撮影した場合を考える。カメラ２００の動画映像として、Δｔ時間毎に１枚のフレーム画像を記録できる場合には、記録部１０１に、Ｔ／Δｔ枚のフレーム画像が時系列的に記録されることになる。 In the recording unit 101, a moving image shot by the camera 200 is recorded. More specifically, a moving image captured by the camera 200 is recorded as data obtained by recording a plurality of frame images in time series. For example, consider a case where a moving image from time 1 to T is shot by the camera 200. When one frame image can be recorded every Δt time as a moving image of the camera 200, T / Δt frame images are recorded in time series in the recording unit 101.

　なお、画像距離算出装置１００あるいはカメラ２００に、例えば、フレームバッファーを設けて、カメラ２００により記録された１フレーム毎の画像（フレーム画像）がフレームバッファーに一時的に記録され、フレームバッファーに記録された１フレーム毎の画像（フレーム画像）が、時系列的に記録部１０１に記録される構成であってもよい。また、カメラ２００で撮影された動画映像を記録部１０１にリアルタイムに取り込むのではなく、予めカメラ２００によって撮影された動画映像（過去に撮影された動画映像）を、複数のフレーム画像の時系列的データとして、記録部１０１に記録させるものであってもよい。 Note that the image distance calculation apparatus 100 or the camera 200 is provided with, for example, a frame buffer, and an image for each frame (frame image) recorded by the camera 200 is temporarily recorded in the frame buffer and recorded in the frame buffer. In addition, a configuration in which an image (frame image) for each frame is recorded in the recording unit 101 in time series may be employed. In addition, the moving image captured by the camera 200 is not taken into the recording unit 101 in real time, but the moving image captured by the camera 200 in advance (moving image captured in the past) is time-series of a plurality of frame images. The data may be recorded in the recording unit 101 as data.

　また、カメラ２００により撮影された動画映像は、デジタル映像だけには限定されない。例えば、撮影された動画映像がアナログ映像であっても、デジタル変換処理によって、時系列的にフレーム画像を記録部１０１に記録させることが可能であれば、画像距離算出装置１００における距離算出処理に利用することが可能である。 In addition, the moving image taken by the camera 200 is not limited to a digital image. For example, even if the captured moving image is an analog image, if the frame image can be recorded in the recording unit 101 in time series by digital conversion processing, the distance calculation processing in the image distance calculation apparatus 100 is performed. It is possible to use.

　記録部１０１は、一般的なハードディスク等によって構成されている。なお、記録部１０１の構成は、ハードディスクだけに限定されるものではなく、フラッシュメモリ、ＳＳＤ（Solid State Drive / Solid State Disk）などであってもよい。記録部１０１は、動画映像を、時系列的な複数のフレーム画像として記録することが可能な記録媒体であるならば、具体的な構成は特に限定されるものではない。 The recording unit 101 is configured by a general hard disk or the like. The configuration of the recording unit 101 is not limited to a hard disk, but may be a flash memory, an SSD (Solid State Drive / Solid State Disk), or the like. The recording unit 101 is not particularly limited as long as the recording unit 101 is a recording medium capable of recording a moving image as a plurality of time-series frame images.

　ＣＰＵ１０４は、記録部１０１に時系列的に記録された複数のフレーム画像（映像）に基づいて、フレーム画像の画素毎に、カメラ位置からフレーム画像に写っている対象物（撮影対象物）までの距離を算出する処理を行う。ＣＰＵ１０４は、後述する処理プログラム（図２、図１７および図２２のフローチャートに基づくプログラム）に従って、画素毎の距離算出処理を行う。 Based on a plurality of frame images (videos) recorded in time series in the recording unit 101, the CPU 104 moves from the camera position to the object (shooting object) reflected in the frame image for each pixel of the frame image. A process for calculating the distance is performed. The CPU 104 performs a distance calculation process for each pixel in accordance with a processing program (a program based on the flowcharts of FIGS. 2, 17, and 22) described later.

　ＲＯＭ１０２には、フレーム画像の画素毎に、フレーム画像に写っている撮影対象物までの距離を算出するためプログラム等が記録されている。ＣＰＵ１０４は、ＲＯＭ１０２より読み込んだプログラムに基づいて、画素毎の距離算出処理を行う。ＲＡＭ１０３は、ＣＰＵ１０４の処理に利用されるワークエリアとして用いられる。 In the ROM 102, a program or the like is recorded for each pixel of the frame image to calculate the distance to the object to be photographed in the frame image. The CPU 104 performs a distance calculation process for each pixel based on a program read from the ROM 102. The RAM 103 is used as a work area used for processing by the CPU 104.

　なお、実施の形態に係る画像距離算出装置１００では、ＣＰＵ１０４において実行されるプログラム（画像距離算出用プログラム（図２に示すフローチャート）、貼り合わせ処理用プログラム（図１７および図２２に示すフローチャート））を、ＲＯＭ１０２に記録する構成として説明を行うが、これらのプログラムは、記録部１０１に記録されるものであってもよい。 Note that in the image distance calculation apparatus 100 according to the embodiment, a program executed by the CPU 104 (an image distance calculation program (a flowchart shown in FIG. 2), a pasting process program (a flowchart shown in FIGS. 17 and 22)). However, these programs may be recorded in the recording unit 101.

　カメラ２００は、レンズを通してカメラ正面の景色等を動画映像として撮影可能な撮影手段である。動画映像を撮影することが可能であれば、カメラ２００の種類・構成は特に限定されない、例えば、一般的なムービーカメラであってもよく、また、スマートフォン等のカメラ機能を利用するものであってもよい。 The camera 200 is a photographing means capable of photographing a scene in front of the camera as a moving image through a lens. The type and configuration of the camera 200 are not particularly limited as long as it can shoot a moving image. For example, a general movie camera may be used, and a camera function such as a smartphone is used. Also good.

　モニタ２１０は、カメラ２００で撮影された動画映像や、距離算出処理により求められる画素毎の距離を示した画像等（例えば、後述する図１４や図１６（ａ）（ｂ）の画像等）を、ユーザに対して視認可能に表示させることが可能となっている。モニタ２１０には、液晶ディスプレイや、ＣＲＴディスプレイなどの一般的な表示装置が用いられる。 The monitor 210 displays a moving image captured by the camera 200, an image indicating the distance for each pixel obtained by the distance calculation process (for example, an image shown in FIGS. 14, 16A, and 16B described later). It can be displayed so as to be visible to the user. For the monitor 210, a general display device such as a liquid crystal display or a CRT display is used.

　次に、記録部１０１に記録された複数のフレーム画像の時系列データに基づいて、ＣＰＵ１０４が、フレーム画像の画素毎に距離を算出する方法について説明する。図２は、画像距離算出装置１００のＣＰＵ１０４が行う、画像距離算出処理（画素毎の距離算出処理）の内容を示したフローチャートである。 Next, a method in which the CPU 104 calculates the distance for each pixel of the frame image based on the time series data of the plurality of frame images recorded in the recording unit 101 will be described. FIG. 2 is a flowchart showing the contents of the image distance calculation process (distance calculation process for each pixel) performed by the CPU 104 of the image distance calculation apparatus 100.

　まず、カメラ２００が一定の速度ｖで移動しながら、撮影対象物を撮影する場合を考える。図３は、カメラ２００と撮影対象物との関係を模式的に示した図である。図３では、カメラ２００が速度ｖでＡ点からＢ点へとΔｔ時間だけ移動しながら、撮影対象物を撮影した場合が示されている。撮影対象物の位置をＳ点とする。Ａ点からＢ点までの距離は、ｖΔｔで示すことができる。ＳＡ（Ｓ点とＡ点とを結んだ線）とＳＢ（Ｓ点とＢ点とを結んだ線）とのなす角をΔθとし、ＳＡとＡＢ（Ａ点とＢ点とを結んだ線）とのなす角をθとする。また、ＳＡの長さ＝ＳＢの長さ＝ｄとする。このように定義した場合には、図３に示すように、点ＢからＳＡに下ろした垂線の長さは、ｖΔｔsinθと示すことができる。このｖΔｔsinθの長さは、長さｄとなす角Δθとの積、ｄΔθに近似した値となるため、次の式１として示すことができる。 First, consider the case where the camera 200 moves at a constant speed v while shooting an object to be shot. FIG. 3 is a diagram schematically showing the relationship between the camera 200 and the object to be photographed. FIG. 3 shows a case where the camera 200 has photographed an object to be photographed while moving at a speed v from point A to point B for Δt time. The position of the object to be imaged is S point. The distance from point A to point B can be represented by vΔt. The angle formed by SA (the line connecting point S and point A) and SB (the line connecting point S and point B) is Δθ, and SA and AB (the line connecting point A and point B) The angle between and is θ. Further, the length of SA = the length of SB = d. When defined in this way, as shown in FIG. 3, the length of the perpendicular drawn from the point B to SA can be expressed as vΔtsinθ. The length of vΔtsinθ is a product of the angle dθ and the length dθ, which is a value approximated to dΔθ, and can be expressed as the following Expression 1.

　　　　　Δθ＝ｖΔｔsinθ／ｄ　　　・・・式１
　式１から明らかなように、カメラ２００から撮影対象物までの距離が長いほど（つまり、撮影対象物がカメラ２００から遠いほど）、なす角Δθが小さく（狭く）なる。一方で、カメラ２００から撮影対象物までの距離が短いほど（つまり、撮影対象物がカメラ２００に近いほど）、なす角Δθが大きく（広く）なる。言い換えると、日常生活でも経験するように、自分が動いている場合に、進行方向に対して側方に位置するものの移動速度を、遠くのものと近くのものとで比較すると、遠くのものは動きが少ないため、あまり横方向へ変化しない。しかしながら、近くのものは動きが大きくなり、横方向へ速い速度で移動する。 Δθ = vΔtsinθ / d Equation 1
As is clear from Equation 1, the longer the distance from the camera 200 to the object to be photographed (that is, the farther the object to be photographed is from the camera 200), the smaller the angle Δθ is made (narrow). On the other hand, the shorter the distance from the camera 200 to the object to be photographed (that is, the closer the object to be photographed is to the camera 200), the larger the angle Δθ that is formed. In other words, as you experience in daily life, when you are moving, if you compare the movement speed of things that are located to the side with respect to the direction of travel, the distance is far Because there is little movement, it does not change much in the horizontal direction. However, nearby objects have greater movement and move laterally at a faster speed.

　このように、カメラ２００によって撮影された動画映像に映っている撮影対象物の横方向への動きの違いを求めることによって、フレーム画像の画素毎に、カメラから撮影対象物までの距離を算出することが可能になる。図３は、古典的な動的視差用いた手法（モーションパララックス）としてよく知られた構成を模式的に示している。 As described above, the distance from the camera to the photographing object is calculated for each pixel of the frame image by obtaining the difference in the lateral movement of the photographing object shown in the moving image captured by the camera 200. It becomes possible. FIG. 3 schematically shows a configuration well known as a method using dynamic parallax (motion parallax).

　また、図３に示す動的視差（モーションパララックス）を用いた手法を水平に分離したものは、一般にステレオビジョンと称される。ステレオビジョンでは、図３のＡとＢとのそれぞれが、人間の左目と右目とに相当する。この場合は、カメラの移動は考えない。しかし、この古典的範囲、すなわち、ステレオビジョンに囚われる限り、図６を示して後述するように、距離データの取得において制約をもつことになる。 Also, the method using the dynamic parallax (motion parallax) shown in FIG. 3 that is horizontally separated is generally called stereo vision. In stereo vision, each of A and B in FIG. 3 corresponds to a human left eye and right eye. In this case, the movement of the camera is not considered. However, as long as this classic range, that is, stereo vision is used, as described later with reference to FIG.

　画像距離算出装置１００のＣＰＵ１０４では、移動する１台のカメラにより撮影された動画映像に基づいて、撮影されたフレーム画像に示される撮影対象物の位置の変化を時系列的に求めることによって、フレーム画像に映し出された撮影対象物までの距離を、画素毎に求める処理を行う。 The CPU 104 of the image distance calculation apparatus 100 obtains a change in the position of the object to be photographed indicated in the photographed frame image in time series based on a moving image photographed by one moving camera, thereby obtaining a frame. Processing for obtaining the distance to the object to be photographed displayed in the image for each pixel is performed.

　記録部１０１には、上述したように、複数のフレーム画像を時系列的に記録したデータが動画映像として記録されている。画像距離算出装置１００のＣＰＵ１０４は、図４に示すように、フレーム画像の縦軸をｙ軸、横軸をｘ軸として、さらに時系列的な要素をｔ軸として、動画映像を３次元的な空間（時空間パターン）として判断する。つまり、フレーム画像の画素は、３次元的な空間の座標を用いて、ｆ（ｘ，ｙ，ｔ）で示すことが可能であると考える。ここで、ｆは通常カラーのＲ，Ｇ，Ｂ（赤、緑、青）の要素をもつものである。但し、１≦ｘ≦Ｘ，１≦ｙ≦Ｙ，１≦ｔ≦Ｔとなり、Ｘはフレーム画像の横（幅）方向の最大画素数、Ｙはフレーム画像の縦（高さ）方向の最大画素数、Ｔは撮影された映像時間を示している。時間Ｔの値は、最後のフレーム画像の数に等しいものとする。本実施の形態に係る画像距離算出装置１００のＣＰＵ１０４は、カメラ２００により撮影された動画映像の任意の時間のフレーム画像を抽出する（図２のＳ．１）。抽出されたフレーム画像は、図４に示すように、上述した時間ｔ＝１のフレーム画像に該当するものである。しかしながら、一般には、任意の時間のものがフレーム画像として用いられる。後述するように広域のシーンについて画素毎に距離を求める場合には、いくつもの時間においてフレーム画像を抽出する必要が生じる。 In the recording unit 101, as described above, data obtained by recording a plurality of frame images in time series is recorded as a moving image. As shown in FIG. 4, the CPU 104 of the image distance calculation apparatus 100 converts a moving image into a three-dimensional image with the vertical axis of the frame image as the y axis, the horizontal axis as the x axis, and the time series elements as the t axis. Judge as space (time-space pattern). That is, it is considered that the pixel of the frame image can be represented by f (x, y, t) using the coordinates of the three-dimensional space. Here, f has elements of normal colors R, G, B (red, green, blue). However, 1 ≦ x ≦ X, 1 ≦ y ≦ Y, and 1 ≦ t ≦ T, where X is the maximum number of pixels in the horizontal (width) direction of the frame image, and Y is the maximum pixel in the vertical (height) direction of the frame image The number, T, indicates the video time taken. The value of time T is assumed to be equal to the number of the last frame images. The CPU 104 of the image distance calculation apparatus 100 according to the present embodiment extracts a frame image at an arbitrary time of a moving image captured by the camera 200 (S.1 in FIG. 2). The extracted frame image corresponds to the above-described frame image at time t = 1 as shown in FIG. However, generally, an arbitrary time is used as a frame image. As will be described later, when obtaining a distance for each pixel in a wide-area scene, it is necessary to extract frame images in a number of times.

　このように、動画映像を３次元的な空間として判断すると、フレーム画像のｘ座標を任意の値ｘ＝ｘ０に固定して、フレーム画像のｙ軸の要素と、時間ｔ軸の要素とに基づいて、スライス画像を生成することが可能になる（Ｓ．２）。スライス画像は、ｇ（ｔ，ｙ）（＝ｆ（ｘ０，ｙ，ｔ））で示すことが可能になる。但し、１≦ｙ≦Ｙ，１≦ｔ≦Ｔとする。また、時間ｔ＝１におけるフレーム画像は、ｆ（ｘ，ｙ，１）と示すことができる。但し，１≦ｘ≦Ｘである。本実施の形態において、説明の便宜上、撮影時間ｔを１≦ｔ≦１７５とする。 As described above, when the moving image is determined as a three-dimensional space, the x-coordinate of the frame image is fixed to an arbitrary value x = x0, and based on the y-axis element and the time t-axis element of the frame image. Thus, a slice image can be generated (S.2). The slice image can be indicated by g (t, y) (= f (x0, y, t)). However, 1 ≦ y ≦ Y and 1 ≦ t ≦ T. The frame image at time t = 1 can be represented as f (x, y, 1). However, 1 ≦ x ≦ X. In this embodiment, for convenience of explanation, the shooting time t is set to 1 ≦ t ≦ 175.

　図５（ａ）は、ｔ＝１におけるフレーム画像ｆ（ｘ，ｙ，１）を示しており、図５（ｂ）は、ｘ＝ｘ０（図５（ａ）において、ｘ＝ｘ０が示されている）におけるスライス画像ｇ（ｔ，ｙ）を示した図である。図５（ａ）（ｂ）のそれぞれの画像は、カメラ２００が左から右へと移動する状態で川岸から対岸を撮影した動画映像に基づいて生成されている。具体的には、川岸沿いを移動する車両の窓から、カメラ２００を用いて動画映像を撮影したものである。このため、左から右へカメラ２００が移動する際には、上下方向の振動やずれなどが発生している。従って、カメラ２００で撮影された動画映像は、完全な平行移動を伴う映像ではない。 FIG. 5A shows a frame image f (x, y, 1) at t = 1, and FIG. 5B shows x = x0 (in FIG. 5A, x = x0 is shown). 2 is a diagram showing a slice image g (t, y). Each of the images in FIGS. 5A and 5B is generated based on a moving image obtained by photographing the opposite bank from the river bank in a state where the camera 200 moves from left to right. Specifically, a moving image is taken using a camera 200 from a window of a vehicle moving along the riverbank. For this reason, when the camera 200 moves from left to right, vertical vibrations and shifts occur. Therefore, the moving image captured by the camera 200 is not an image that involves complete translation.

　図５（ｂ）に示すスライス画像では、ｘ＝ｘ０におけるスライス画像であって、横軸ｔの左端がｔ＝１であり、右端がｔ＝１７５（＝Ｔ）となる。図５（ａ）のフレーム画像と、図５（ｂ）のスライス画像とを比較する。フレーム画像に写っている撮影対象物のうち、カメラ２００の撮影位置から遠い位置に存在する撮影対象物（例えば、川の対岸の建物や土手等）は、スライス画像においても、フレーム画像と同じような状態で記録されて（写って）おり、ｔ軸方向への画像の圧縮（画素間距離の圧縮）はあまり行われていない。一方で、カメラ２００の撮影位置から近い位置に存在するフレーム画像の撮影対象物（例えば、川の手前側の芝や地面等）は、スライス画像において、フレーム画像よりも画像が圧縮（画素間距離が圧縮）された状態で記録されて（写って）いる。 The slice image shown in FIG. 5B is a slice image at x = x0, where the left end of the horizontal axis t is t = 1 and the right end is t = 175 (= T). The frame image in FIG. 5A is compared with the slice image in FIG. Among the shooting objects shown in the frame image, shooting objects (for example, buildings and banks on the opposite bank of the river) that are far from the shooting position of the camera 200 are the same as the frame images in the slice image. The image is recorded (photographed) in such a state that the image is not compressed much in the t-axis direction (inter-pixel distance compression). On the other hand, an object to be captured of a frame image (for example, grass or ground on the near side of the river) existing near the shooting position of the camera 200 is compressed in the slice image more than the frame image (inter-pixel distance). Is recorded (photographed) in a compressed state.

　図５（ａ）（ｂ）を比較すると、最も遠い位置にある撮影対象物のフレーム画像からスライス画像への圧縮率（画像の圧縮率、画素間距離の圧縮率）は１倍程度であるのに対して、最も近い位置にある撮影対象物の圧縮率（画像の圧縮率、画素間距離の圧縮率）は、４倍程度になっている。この圧縮率の違いは、カメラ２００からの距離に比例するものとなっている。さらに、この圧縮率は、単純に１倍から４倍までの４段階が基準となるのではなく、アナログ的に、つまり、連続的（多段的）に距離に比例させて判断することができる。従って、カメラから撮影対象物までの距離を、圧縮状態に基づいて、より広いダイナミックレンジ（尺度・範囲）で連続的（多段的）に求めることが可能になる。 Comparing FIGS. 5A and 5B, the compression rate from the frame image to the slice image (image compression rate, compression rate of the inter-pixel distance) of the imaging object at the farthest position is about 1 time. On the other hand, the compression rate (image compression rate, compression rate of inter-pixel distance) of the object to be photographed at the closest position is about 4 times. This difference in compression rate is proportional to the distance from the camera 200. Furthermore, this compression rate is not simply based on four levels from 1 to 4 times, but can be determined in analog, that is, continuously (multi-stage) in proportion to the distance. Therefore, the distance from the camera to the object to be photographed can be obtained continuously (multistage) with a wider dynamic range (scale / range) based on the compression state.

　この点において、既に説明したように、ステレオビジョン法（２つの画像の視差を利用して撮影対象物の距離を算出する方法）では、カメラ間の距離が小さい場合、遠方と近場との距離の算出精度に関して、視差（disparity）の範囲を、４段階程度の違いしか求めることができない。このため、通常のステレオビジョン法では、距離の算出精度を十分に確保することが難しかった。カメラ間の距離を大きくする場合には、原理的に視差（disparity）の値を大きくとることができるが、２つの画像上で対応する画素（ピクセル）を検出することが困難となる。しかしながら、本実施の形態に係る画像距離算出装置１００では、４段階程度ではなく、連続的（より多段的）に、遠方と近場との距離の算出精度を高めることができ、より広いダイナミックレンジで距離を求めることが可能になる。 In this respect, as already described, in the stereo vision method (a method of calculating the distance between the imaging objects using the parallax between two images), the distance between the far field and the near field when the distance between the cameras is small. With respect to the calculation accuracy of, only a difference of about 4 steps can be obtained for the range of disparity. For this reason, it has been difficult to ensure sufficient distance calculation accuracy with the normal stereo vision method. When the distance between the cameras is increased, the value of disparity can be increased in principle, but it is difficult to detect corresponding pixels (pixels) on the two images. However, in the image distance calculation apparatus 100 according to the present embodiment, the calculation accuracy of the distance between the far field and the near field can be increased not continuously in about four stages but continuously (in more stages), and a wider dynamic range. The distance can be obtained with.

　このように、スライス画像には、撮影対象物の画像の圧縮状態によって、動的視差（モーションパララックス）の累積的状態が明示的かつ静的に表現されている。画像距離算出装置１００では、動的視差の累積状態が表現されるスライス画像の圧縮状態（スライス画像の画素毎の圧縮状態）に基づいて、フレーム画像の画素毎に、カメラ２００から撮影対象物までの距離を求める。本実施の形態では、画像距離算出装置１００を用いて、カメラ２００から撮影対象物までの距離を、画素毎に求める方法を、累積視差法と称する。 As described above, in the slice image, the cumulative state of the dynamic parallax (motion parallax) is explicitly and statically expressed depending on the compression state of the image of the photographing object. In the image distance calculation apparatus 100, from the camera 200 to the photographing object for each pixel of the frame image based on the compression state of the slice image in which the accumulated state of dynamic parallax is expressed (compression state for each pixel of the slice image). Find the distance. In the present embodiment, a method of obtaining the distance from the camera 200 to the object to be photographed for each pixel using the image distance calculation device 100 is referred to as a cumulative parallax method.

　図６は、従来のステレオビジョン法（２つの画像の視差を利用する方法）と、従来のEpipolar-Plane-Image（ＥＰＩ）法と、累積視差法との違いを示した表である。ステレオビジョン法では、２つのカメラで同時に撮影された２つの画像を用いて、それぞれの画像から特徴点の抽出を行い、あるいは線的な動的計画法によりマッチングを行う。視差は２つの画像に黙示的に示されており、その視差を２つの画像のマッチングに基づいて求めることによって、撮影対象物までの距離を求めることが可能になる。但し、求められる距離のダイナミックレンジは比較的狭いという特徴がある。 FIG. 6 is a table showing the difference between the conventional stereo vision method (method using the parallax of two images), the conventional Epipolar-Plane-Image (EPI) method, and the cumulative parallax method. In the stereo vision method, feature points are extracted from each image using two images photographed simultaneously by two cameras, or matching is performed by linear dynamic programming. The parallax is implicitly shown in the two images, and by obtaining the parallax based on the matching of the two images, the distance to the object to be photographed can be obtained. However, the required dynamic range of the distance is relatively narrow.

　また、ＥＰＩ法は、スライス画像の中から線分を抽出し、各線分が撮影された目標対象物の１点に対応し、抽出された線分の傾きが距離に対応する方法である。抽出される線分の数は、物体を表す点の数より極めて少ないので、撮影された目標対象物を示す点は、まばらにしか得られない。そのためのテクスチャを表面にマッピングすることが困難となっている。 Also, the EPI method is a method in which line segments are extracted from the slice image, each line segment corresponds to one point of the target object photographed, and the slope of the extracted line segment corresponds to the distance. Since the number of extracted line segments is much smaller than the number of points representing the object, the points indicating the captured target object can be obtained only sparsely. Therefore, it is difficult to map the texture for the surface.

　ＥＰＩ法に関しては、下記の文献が参考となる。
[1] 山本正信、連続ステレオ画像からの３次元情報の抽出、電子情報通信学会論文誌Ｄ、Vol. J69-D, No. 11, p1631-1638, 1986年11月25日
[2] Robert C. Bolles, H. Harlyn Baker, David H. Marimont,"Epipolar-Plane Image Analysis: An approach to Determining Structure from Motion", Inter. Journal of Computer Vision, Issue 1, pp. 7-55, (1987) For the EPI method, the following documents are helpful.
[1] Masanobu Yamamoto, Extraction of 3D information from continuous stereo images, IEICE Transactions D, Vol. J69-D, No. 11, p1631-1638, November 25, 1986
[2] Robert C. Bolles, H. Harlyn Baker, David H. Marimont, "Epipolar-Plane Image Analysis: An approach to Determining Structure from Motion", Inter. Journal of Computer Vision, Issue 1, pp. 7-55, (1987)

　一方で、累積視差法では、フレーム画像とスライス画像を用いて、後述する動的計画法（ライン対画像ＤＰ（dynamic programming）マッチング法）によりマッチングを行う。スライス画像には、圧縮状態によって動的視差の累積が明示的かつ静的に示されている。この圧縮状態を利用することによって、撮影対象物までの距離を求めることができる。求められる距離のダイナミックレンジは、従来のステレオビジョン法に比べて広いという特徴がある。 On the other hand, in the cumulative parallax method, matching is performed by a dynamic programming method (line-to-image DP (dynamic-programming) matching method) described later, using frame images and slice images. The slice image explicitly and statically indicates the dynamic parallax accumulation depending on the compression state. By using this compressed state, the distance to the object to be imaged can be obtained. The required dynamic range of the distance is wider than that of the conventional stereo vision method.

　図３では、カメラ２００が一定の速度ｖで移動しながら、撮影対象物を撮影する場合について説明した。一方で、カメラ２００が、空間の座標点（ｘ，ｙ）を時間ｔに依存して変化する変化量ｖ（ｘ，ｙ，ｔ）Δｔを速度（画素速度の動的視差）として示す。そして、カメラ２００の速度は、画面上の画素（ピクセル）の動く速度と考えることができる。従って、画像（ｘ，ｙ）のｘ軸の変化Δｘ（ｔ，ｙ）＝ｘ（ｔ＋Δｔ，ｙ）－ｘ（ｔ，ｙ）が速度となる。このため、図３と同様に、Δｘ（ｔ，ｙ）sinθ＝（ｘ（ｔ＋１，ｙ）－ｘ（ｔ，ｙ））sinθ＝ｄΔθが成立することになる。 In FIG. 3, the case has been described where the camera 200 moves at a constant speed v while shooting a shooting target. On the other hand, the camera 200 indicates a change amount v (x, y, t) Δt that changes the coordinate point (x, y) of the space depending on the time t as a velocity (dynamic parallax of pixel velocity). The speed of the camera 200 can be considered as the moving speed of the pixels on the screen. Therefore, the change in the x axis of the image (x, y) Δx (t, y) = x (t + Δt, y) −x (t, y) is the speed. Therefore, as in FIG. 3, Δx (t, y) sin θ = (x (t + 1, y) −x (t, y)) sin θ = dΔθ holds.

　ここで注意すべき点として、まず、終端時間Ｔにおける累積された動的視差（累積モーションパララックス、Accumulated Motion Parallax: AMP）が、ｘ（Ｔ，ｙ′）として最初に計算によって得られる。次に、ｘ（Ｔ，ｙ′）を決めている各ｘ（Ｔ，ｙ′）が、事後的なバックトレース処理によって得られる。図５（ａ）（ｂ）は、その後に、その時間差分を作成する場合のモデルとして示されている。一方で、図３は、Δθが視差（disparity）として得られることを前提とした図である。従って、図３に示したステレオビジョンでは、視差の累積という概念が入っていない。 As a point to be noted here, first, the accumulated dynamic parallax (Accumulated Motion Parallax: AMP) at the end time T is first obtained by calculation as x (T, y ′). Next, each x (T, y ′) that determines x (T, y ′) is obtained by a subsequent backtrace process. FIGS. 5A and 5B are shown as models for creating the time difference thereafter. On the other hand, FIG. 3 is a diagram based on the premise that Δθ is obtained as disparity. Therefore, the stereo vision shown in FIG. 3 does not include the concept of parallax accumulation.

　本実施の形態に係る画像距離算出装置１００では、累積された動的視差（累積モーションパララックス）と呼ぶ概念を考える。まず、スライス画像の画素ｇ（ｔ，ｙ）に対応する、フレーム画像の画素をｆ（ｘ（ｔ，ｙ），ｙ，ｔ０）とする。また、スライス画像の画素ｇ（ｔ＋１，ｙ）に対応するフレーム画像の画素を、ｆ（ｘ（ｔ，ｙ）＋Δｘ（ｔ，ｙ），ｙ，ｔ０）とする。本実施の形態に係る画像距離算出装置１００では、カメラ２００を横方向（略水平方向）に移動して撮影している。このため、スライス画像の横軸ｔにおいて、ｔが１つだけ増加する場合、フレーム画像の画素ｆでは、ｔの増加に対応して、Δｘ（ｔ，ｙ）だけ画素の座標（ｘ軸方向の座標）が変化することになる。 In the image distance calculation apparatus 100 according to the present embodiment, a concept called accumulated dynamic parallax (cumulative motion parallax) is considered. First, let the pixel of the frame image corresponding to the pixel g (t, y) of the slice image be f (x (t, y), y, t0). Also, let the pixel of the frame image corresponding to the pixel g (t + 1, y) of the slice image be f (x (t, y) + Δx (t, y), y, t0). In the image distance calculation apparatus 100 according to the present embodiment, the camera 200 is moved in the horizontal direction (substantially horizontal direction) and is photographed. For this reason, when t increases by one on the horizontal axis t of the slice image, in the pixel f of the frame image, corresponding to the increase of t, the pixel coordinates (in the x-axis direction) are increased by Δx (t, y). Coordinate) will change.

　ここで、フレーム画像の画素のｘ軸方向への移動距離Δｘ（ｔ，ｙ）の値は、カメラ２００から撮影対象物までの距離によって大きく異なる。すなわち、フレーム画像の画素（ｘ，ｙ）に示される（写っている）撮影対象物が、カメラ２００から遠方に存在する場合、フレーム画像における画素の移動距離Δｘ（ｔ，ｙ）の値は１に近い値となる。一方で、撮影対象物がカメラ２００から近い位置に存在する場合、フレーム画像における画素の移動距離Δｘ（ｔ，ｙ）の値は、１よりも大きな値になる。 Here, the value of the movement distance Δx (t, y) of the pixel in the frame image in the x-axis direction varies greatly depending on the distance from the camera 200 to the photographing object. That is, when the object to be photographed (shown) shown in the pixel (x, y) of the frame image exists far from the camera 200, the value of the movement distance Δx (t, y) of the pixel in the frame image is 1. A value close to. On the other hand, when the object to be photographed is present at a position close to the camera 200, the value of the pixel movement distance Δx (t, y) in the frame image is larger than 1.

　図７は、スライス画像の画素に対応するフレーム画像の画素の位置を、黒丸（●）を用いて模式的に示した図である。図７の縦は、フレーム画像のｙ軸に対応し、図７の横は、フレーム画像のｘ軸に対応している。黒丸は、スライス画像の画素に対応するフレーム画像の画素を示している。図７の横方向には、説明を簡単にするために２０個の画素（黒丸）が示されており、隣り合う画素（黒丸）の間隔が広くなったり、または、狭くなったりしている。画素（黒丸）１つが各時間ｔ（ｔ＝１～２０）のときのスライス画像の画素を模式的に示している。最も左側に位置する画素（黒丸）は、時間ｔ＝１のときのスライス画像の画素に対応するフレーム画像の画素の配置（スライス画像の画素に対応する画素の配置位置）を示している。また、各列の最も右側に位置する画素（黒丸）は、最後の時間ｔ＝２０のときのスライス画像の画素に対応するフレーム画像の画素の配置を示している。スライス画像における最後の時間ｔ（＝２０）に該当するフレーム画像の画素の点は、スポッティング点と称される。 FIG. 7 is a diagram schematically showing the positions of the pixels of the frame image corresponding to the pixels of the slice image using black circles (●). The vertical axis in FIG. 7 corresponds to the y axis of the frame image, and the horizontal axis in FIG. 7 corresponds to the x axis of the frame image. A black circle indicates a pixel of the frame image corresponding to a pixel of the slice image. In the horizontal direction of FIG. 7, 20 pixels (black circles) are shown for ease of explanation, and the interval between adjacent pixels (black circles) is widened or narrowed. A pixel of a slice image when one pixel (black circle) is at each time t (t = 1 to 20) is schematically shown. The pixel located on the leftmost side (black circle) indicates the arrangement of the pixel of the frame image corresponding to the pixel of the slice image at the time t = 1 (the arrangement position of the pixel corresponding to the pixel of the slice image). Further, the pixel (black circle) located on the rightmost side of each column indicates the arrangement of the pixel of the frame image corresponding to the pixel of the slice image at the last time t = 20. The pixel point of the frame image corresponding to the last time t (= 20) in the slice image is referred to as a spotting point.

　本実施の形態に係る画像距離算出装置１００では、カメラ２００を横方向（略水平方向）に移動させて撮影を行っているため、スライス画像には、時間ｔ＝１から時間ｔ＝２０までの２０単位時間分の画素が横方向に記録される。一方で、図７に示すように、フレーム画像では、スライス画像に記録される２０個の画素（黒丸）の間隔が、ｙ軸毎に異なった間隔となる。間隔が異なる理由は、上述したように、カメラ２００から撮影対象物までの距離に応じて、Δｘ（ｔ，ｙ）が異なるためである。従って、図７において、隣り合う画素（黒丸）の間隔が狭い撮影対象物は、カメラ２００からの距離が遠いことを示しており、隣り合う画素（黒丸）の間隔が広い撮影対象物は、カメラ２００からの距離が近いことを示している。 In the image distance calculation apparatus 100 according to the present embodiment, the camera 200 is moved in the horizontal direction (substantially horizontal direction) and shooting is performed. Therefore, the slice image includes time t = 1 to time t = 20. Pixels for 20 unit times are recorded in the horizontal direction. On the other hand, as shown in FIG. 7, in the frame image, the interval of 20 pixels (black circles) recorded in the slice image is different for each y-axis. The reason why the intervals are different is that, as described above, Δx (t, y) varies depending on the distance from the camera 200 to the object to be photographed. Therefore, in FIG. 7, a shooting object with a narrow interval between adjacent pixels (black circles) indicates that the distance from the camera 200 is long, and a shooting object with a large interval between adjacent pixels (black circles) It shows that the distance from 200 is short.

　また、時間ｔ＝２０となるスポッティング点の位置が、ｙ軸毎に異なる理由は、ｙ軸毎に求められるΔｘ（ｔ，ｙ）の累積が異なるためである。ある画素（黒丸）の座標ｘ（ｔ，ｙ）とその右側の画素（黒丸）の座標ｘ（ｔ＋１，ｙ）との差がΔｘ（ｔ，ｙ）となるため、スポッティング点の画素（Ｔ＝２０）の座標ｘ（Ｔ，ｙ）は、隣り合う画素の差Δｘ（ｔ，ｙ）を累積した座標、つまりΣΔｘ（τ，ｙ）（但し、τはτ＝１からτ＝ｔ－１までの和）で示すことができる。このことから明らかなように、スライス画像の最も右側の端（動画映像の最後の時間）の画素（スポッティング点）は、撮影された動画映像の動的視差を累積的に含んだ画素となる。 Also, the reason why the position of the spotting point at time t = 20 is different for each y-axis is that the accumulation of Δx (t, y) obtained for each y-axis is different. Since the difference between the coordinate x (t, y) of a certain pixel (black circle) and the coordinate x (t + 1, y) of the pixel (black circle) on the right side is Δx (t, y), the spotting point pixel (T = 20) is a coordinate obtained by accumulating the difference Δx (t, y) between adjacent pixels, that is, ΣΔx (τ, y) (where τ is from τ = 1 to τ = t−1). Sum). As is clear from this, the pixel (spotting point) at the rightmost end (last time of the moving image) of the slice image is a pixel that cumulatively includes the dynamic parallax of the captured moving image.

　図８（ａ）は、横方向（略水平方向）に移動するカメラ２００によって、時間ｔ＝１からｔ＝１７５までの１７５単位時間だけ撮影された動画映像に基づいて求められた、時間ｔ＝１のときのフレーム画像を一例として示している。また、図８（ｂ）は、図８（ａ）に示したｘ０の点（線）において、時間ｔ＝１からｔ＝１７５までの映像に基づいて生成されたスライス画像を一例として示している。 FIG. 8A shows a time t = determined based on a moving image captured for 175 unit times from time t = 1 to t = 175 by the camera 200 moving in the horizontal direction (substantially horizontal direction). A frame image at 1 is shown as an example. FIG. 8B shows, as an example, a slice image generated based on video from time t = 1 to t = 175 at the point (line) x0 shown in FIG. 8A. .

　図８（ｂ）に示したスライス画像には、図８（ａ）に示したフレーム画像のｘ０上の各画素の時間ｔ＝１からｔ＝１７５までの変化状態が静的に記録されている。図８（ａ）に示したスライス画像をｘ０から右方向に確認しつつ、スライス画像との違いを調べると、フレーム画像に対応するスライス画像の一部画像が圧縮（画素が圧縮）された状態となっている。この圧縮の程度は、カメラ２００から撮影対象物までの距離によって異なっている。 In the slice image shown in FIG. 8B, the change state from time t = 1 to t = 175 of each pixel on x0 of the frame image shown in FIG. 8A is recorded statically. . When checking the difference from the slice image while checking the slice image shown in FIG. 8A in the right direction from x0, the partial image of the slice image corresponding to the frame image is compressed (the pixels are compressed). It has become. The degree of compression varies depending on the distance from the camera 200 to the object to be photographed.

　また、図８（ｂ）の左側に示された上下一対の曲線（曲点線）Ｌ１およびＬ２は、図８（ａ）に示したフレーム画像のｘ０における、ｙ座標上の２点に対応したスライス画像の画素の配置状態であって、フレーム画像の画素に対応する画素の配置状態を抽出したものである。カメラ２００が撮影対象物に対して完全に水平（平行）に移動しているのであれば、各線（点線）は水平な直線（直点線）となる。しかしながら、本実施の形態に係る画像距離算出装置１００では、カメラ２００が完全に水平（平行）ではなく、上方向（ｙ方向）に振動しながら動画映像が撮影されているため、曲線状となっている。一般に、カメラ２００が対象物のシーンに対して、完全に水平に移動することは考えられないので、上述したように曲線となることは普通に生じ得る。 In addition, a pair of upper and lower curves (curved dotted lines) L1 and L2 shown on the left side of FIG. 8B are slices corresponding to two points on the y coordinate in x0 of the frame image shown in FIG. 8A. It is an arrangement state of the pixels of the image, and the arrangement state of the pixels corresponding to the pixels of the frame image is extracted. If the camera 200 is moved completely horizontally (parallel) with respect to the object to be photographed, each line (dotted line) becomes a horizontal straight line (straight dotted line). However, in the image distance calculation apparatus 100 according to the present embodiment, the camera 200 is not completely horizontal (parallel), and the moving image is captured while vibrating in the upward direction (y direction). ing. In general, since it is not possible for the camera 200 to move completely horizontally with respect to the scene of the object, it can normally occur as a curve as described above.

　また、この上下の曲線上のスポッティング点（時間ｔ＝１７５＝Ｔ）に対応する画素ｘ_Ｔを、図８（ａ）のフレーム画像上の上下位置に、それぞれ、白抜き矢印によって示す。図８（ｂ）の左側に示された曲線の、始点（ｔ＝１の画素）から終点（ｔ＝１７５＝Ｔの画素）までの距離は、スライス画像の幅とほぼ等しい長さになっている。しかしながら、フレーム画像におけるそれぞれのスポッティング点（ｔ＝１７５＝Ｔの画素の位置）は、ｘ０からの長さが異なっている。フレーム画像の上側に示されたｘ_Ｔのスポッティング点におけるｘ０からの長さに比べて、フレーム画像の下側に示されたｘ_Ｔのスポッティング点におけるｘ０からの長さの方が長くなっている。この長さは、上述したΔｘ（ｔ，ｙ）の累積が、ｙ軸の値毎に異なるためである。 Further, the pixel x _T corresponding to spotting point on the upper and lower curve (time t = 175 = T), the vertical position of the frame image of FIG. 8 (a), respectively, indicated by the white arrow. The distance from the start point (pixel of t = 1) to the end point (pixel of t = 175 = T) of the curve shown on the left side of FIG. 8B is substantially equal to the width of the slice image. Yes. However, each spotting point (t = 175 = T pixel position) in the frame image has a different length from x0. Than the length from x0 in spotting point x _T shown in the upper side of the frame image, towards the length from x0 in spotting point x _T shown in the lower side of the frame image is long . This length is because the accumulation of Δx (t, y) described above differs for each y-axis value.

　このため、フレーム画像のｙの値を固定した点であって、時間ｔがｔ＝１からｔ＝Ｔ（＝１７５）までの各時間の点ｘ（１，ｙ），ｘ（２，ｙ），・・・，ｘ（Ｔ，ｙ）に対応する点をスライス画像の画素から求めることができれば、フレーム画像の画素とスライス画像の画素との対応関係を求めることが可能になる。 For this reason, the value of y of the frame image is fixed, and each time point x (1, y), x (2, y) from time t = 1 to t = T (= 175) is fixed. ,..., X (T, y) can be obtained from the pixels of the slice image, the correspondence between the pixels of the frame image and the pixels of the slice image can be obtained.

　図９は、所定のｙ′における、フレーム画像の画素線上の画素ｘ（ｔ）と、スライス画像の画素ｇ（ｔ，ｙ）との対応を模式的に示した図である。なお、フレーム画像の画素線上の画素ｘ（ｔ）は、ｔ＝１からｔ－１までの動的視差Δｘ（ｔ）の累積からなっている。図９の左下の図は、ｘ（ｔ）とｔの対応が非線形であることを示しているが、この図では右図のｙの軸は描かれていない。フレーム画像のｙ′を固定した線上の画素点とスライス画像との対応関係を求めるために、画像距離算出装置１００のＣＰＵ１０４では、ライン対画像のＤＰ（Line-image continuous dynamic programming）と称するマッチング法を用いる。ＣＰＵ１０４では、まず、ライン対画像のＤＰマッチング法（動的計画法）を用いることによって、フレーム画像のｙ′におけるｘ（ｔ）の値、つまり、１次元の累積された動的視差Δｘ（ｔ）と、ｔ軸とｙ軸との２次元からなるスライス画像ｇ（ｔ，ｙ）とのそれぞれの画素毎の最適対応関係に基づいて、時間ｔ＝Ｔにおけるフレーム画像のスポッティング点を求める。その後、ＣＰＵ１０４は、求められたスポッティング点からｔ＝１に向かって、最適点を遡るバックトレース処理を行うことによって、スライス画像の画素とフレーム画像の画素との対応関係の全て、すなわち、ｔ＝１からｔ＝Ｔまでの全ての対応点を求める処理を行う。 FIG. 9 is a diagram schematically showing the correspondence between the pixel x (t) on the pixel line of the frame image and the pixel g (t, y) of the slice image at a predetermined y ′. Note that the pixel x (t) on the pixel line of the frame image is formed by accumulating dynamic parallax Δx (t) from t = 1 to t−1. The lower left diagram in FIG. 9 shows that the correspondence between x (t) and t is non-linear, but the y axis in the right diagram is not drawn in this diagram. In order to obtain the correspondence between the pixel points on the line where y ′ of the frame image is fixed and the slice image, the CPU 104 of the image distance calculation apparatus 100 uses a matching method called DP (Line-image continuous dynamic programming) of the line-to-image. Is used. First, the CPU 104 uses the line-to-image DP matching method (dynamic programming method) to determine the value of x (t) at y ′ of the frame image, that is, the one-dimensional accumulated dynamic parallax Δx (t ) And the optimal correspondence for each pixel between the two-dimensional slice image g (t, y) of the t-axis and the y-axis, the spotting point of the frame image at time t = T is obtained. After that, the CPU 104 performs a backtrace process that goes back to the optimum point from the obtained spotting point toward t = 1, whereby all the correspondence relationships between the pixels of the slice image and the pixels of the frame image, that is, t = Processing for obtaining all corresponding points from 1 to t = T is performed.

　ライン対画像のＤＰマッチング法は、フレーム画像におけるｙ軸の値をｙ′に固定したｙ座標上（ライン上）のｘと、（ｔ，ｙ）の２次元の画像に対して、動的計画法を用いることを特徴としている。このようにｙ軸の値を固定することによって、フレーム画像のライン上の始点画素と、スライス画像の始点画素を一致させることとなる。フレーム画像においてｙ軸の値を固定することが、これまで述べたＤＰマッチングの条件設定となる。 The line-to-image DP matching method uses dynamic programming for a two-dimensional image of x on the y coordinate (on the line) and (t, y) with the y-axis value in the frame image fixed to y ′. It is characterized by using the law. By fixing the y-axis value in this way, the start point pixel on the line of the frame image and the start point pixel of the slice image are matched. Fixing the y-axis value in the frame image is the DP matching condition setting described so far.

　なお、図１０に示すライン対画像のＤＰマッチング法は、既存の画像対画像のＤＰマッチング法を基準としつつ、一方の画像のｙの値をｙ′に固定して得られる線パターンのみを縦に並べることで、側面の画像を構成している。また、他方の画像のｙ＝ｙ′を始点として、３次元空間で、最適値を計算し、側面上の点で最適累積値を見出すことで、始点から最適累積値を与える点をスポッティング点と定めることが可能となっている。側面をなす画像は形式的には２次元画像であり、他方側面をなす画像も２次元画像であるので、画像と画像のマッチングであるように見えるが、側面の画像が１つの同じ線状の系列からのみ構成されるので、実質的にはライン対画像のマッチングとなっている。かつ、スポッティング機能も付与されている。従来から知られている、一の１次元のラインと他の１次元のラインとのマッチングからなるライン対ラインのＤＰマッチング法や、一の２次元の画像と、他の２次元の画像とのマッチングからなる画像対画像のＤＰマッチング法との中間に位置するアルゴリズムとなっている。 The line-to-image DP matching method shown in FIG. 10 is based on the existing image-to-image DP matching method, and only the line pattern obtained by fixing the y value of one image to y ′ The side images are configured by arranging them in a row. In addition, the optimal value is calculated in the three-dimensional space with y = y ′ of the other image as the starting point, and the optimal cumulative value is found at the point on the side surface. It is possible to define. Since the image forming the side is formally a two-dimensional image and the image forming the other side is also a two-dimensional image, it appears to be a match between the images, but the image on the side is a single linear image. Since it is composed only of a series, it is substantially a line-to-image matching. In addition, a spotting function is also provided. Conventionally known line-to-line DP matching method consisting of matching one one-dimensional line and another one-dimensional line, or one two-dimensional image and another two-dimensional image The algorithm is positioned in the middle of the image-to-image DP matching method consisting of matching.

　ライン対ラインのＤＰマッチング法や、画像対画像のＤＰマッチング法（２次元画像と２次元画像との対応関係を求める動的計画法）に関しては、例えば、『岡隆一，外２名「連続ＤＰの一般スキームについて－画像スポッティングのための全画素最適マッチング－」，信学技報，電子情報通信学会，IEICE Technical Report, PRMU2010-87,IBISML2010-59 (2010-09)』や特開２０１０－１６５１０４号公報において詳細な説明がなされており、既に知られた技術である。このため、これらのＤＰマッチング法を応用することによって、ライン対画像のＤＰマッチング法による処理を実現することが可能である。但し、この実現には前述したように、「一方の画像のｙの値をｙ′に固定して得られる線パターンのみにする」等の工夫を必要とする。 For line-to-line DP matching method and image-to-image DP matching method (dynamic programming to obtain the correspondence between two-dimensional images and two-dimensional images) "General pixel scheme-All pixel optimal matching for image spotting", IEICE Technical Report, IEICE, IEICE Technical Report, PRMU2010-87, IBISML2010-59 (2010-09)] and JP 2010-165104 A This is a known technique, which is described in detail in this publication. Therefore, by applying these DP matching methods, it is possible to realize processing by the line matching image DP matching method. However, to realize this, as described above, a device such as “only a line pattern obtained by fixing the y value of one image to y ′” is required.

　図１０は、ライン対画像のＤＰマッチング法のアルゴリズム（ＤＰマッチングアルゴリズム）を説明するための図である。ライン対画像のＤＰマッチング法のアルゴリズムとして、様々な局所パス群をとることによって構成を考えることができる。図１０に示す図は、図９に示した対応関係の探索を実現したものになっている。図１０では、３次元空間の演算空間（演算スペース）を考えて、底面に（ｔ，ｙ）のスライス画像に対応する座標（２次元平面画像の座標）を設定し、左側面に、フレーム画像のｙ′を固定した長さｘの画素列（ピクセル列：１次元のライン）と同じものを、整列させることにより、求めたいフレーム画像の対応する座標が得られるように設定する。固定されたｙ′は、実質的にフレーム画像の縦軸の座標に対応するため、左側面は、フレーム画像の１つのライン画像と同じものを、ｙ軸方向に単に並べたものである。底辺の（ｔ，ｙ′）＝（１，ｙ′）を始点として動的計画法によるマッチング（ＤＰマッチング処理）を行うことにより対応関係を求めることができる。同時に、累積最適値を与える（Ｔ，ｘ^＊，ｙ^＊)は側面上の一点であるが、このとき、ｘ^＊は区間［１，Ｘ］の一点であることで、スポッティングを行うといえる。 FIG. 10 is a diagram for explaining an algorithm (DP matching algorithm) of the DP matching method for line pair images. The configuration can be considered by taking various local path groups as an algorithm of the line-to-image DP matching method. The diagram shown in FIG. 10 realizes the search for the correspondence relationship shown in FIG. In FIG. 10, considering a calculation space (calculation space) of a three-dimensional space, coordinates corresponding to a slice image of (t, y) (coordinates of a two-dimensional plane image) are set on the bottom surface, and a frame image is displayed on the left side surface. By aligning the same pixel array (pixel array: one-dimensional line) having a length x with y ′ fixed, the corresponding coordinates of the frame image to be obtained are set. Since the fixed y ′ substantially corresponds to the coordinate of the vertical axis of the frame image, the left side surface is simply the same image as one line image of the frame image arranged in the y-axis direction. Correspondence can be obtained by performing matching (DP matching processing) by dynamic programming using (t, y ′) = (1, y ′) at the bottom as a starting point. At the same time, (T, x ^* , y ^* ) giving the cumulative optimum value is one point on the side surface. At this time, x ^* is one point in the interval [1, X], and it can be said that spotting is performed.

　また、図１０に示されているように、本実施の形態に係るライン対画像のＤＰマッチング法アルゴリズムでは、時間軸を示すｔ軸とｘ軸との関係として、最大１対４の対応関係が大局的に許容できるものとして設定されている。具体的に、図１０の時間を示すｔ軸の値は、ｔ＝１からｔ＝Ｔまでの範囲で規定されており、ｘ軸の値は、ｘ＝１からｘ＝４Ｔまでの範囲で規定されている。つまり、ｘ軸の最大値は、ｔ軸の最大値Ｔの４倍となっている。 Further, as shown in FIG. 10, in the line-to-image DP matching algorithm according to the present embodiment, the correspondence between the t-axis indicating the time axis and the x-axis is a maximum of 1: 4. It is set as globally acceptable. Specifically, the t-axis value indicating the time in FIG. 10 is defined in the range from t = 1 to t = T, and the x-axis value is defined in the range from x = 1 to x = 4T. Has been. That is, the maximum value of the x axis is four times the maximum value T of the t axis.

　このｘ軸とｔ軸のとるべき区間長の関係は、フレーム画像およびスライス画像に写っている撮影対象物の画像毎の圧縮の程度に基づいて決定される。つまり、撮影対象物からカメラ２００までの距離が近いものと遠いものとの距離の比によって設定される。これは、図７を用いて説明したように、スライス画像とフレーム画像との対応する画素（黒丸）の隣の画素（黒丸）までの間隔が、カメラから撮影対象物までの距離により異なっており、この間隔の違いが、カメラから撮影対象物までの距離の違いとして示されているためである。従って、図７を示して説明したように、スライス画像において隣接する２つの画素が、フレーム画像において何ピクセルだけ離れて写っているかによって、カメラから撮影対象物までの距離の違いを求めることができ、このピクセルの伸縮の累積に基づいて、上述したｘ軸とｔ軸とのとるべき区間長の対応関係を求めることができる。この累積フレーム長の最大値に基づいて、ｘ軸の最大値の値を、Ｔの何倍にするかが決定される。局所的なピクセルの伸縮の程度は、Δｘ（ｔ，ｙ）＝ｘ（ｔ，ｙ）－ｘ（ｔ－１，ｙ）によって求められる。 The relationship between the section lengths to be taken between the x-axis and the t-axis is determined based on the degree of compression for each image of the object to be photographed shown in the frame image and the slice image. That is, it is set by the ratio of the distance between the object that is close to the camera 200 and the object that is far away. As described with reference to FIG. 7, the distance from the corresponding pixel (black circle) to the adjacent pixel (black circle) in the slice image and the frame image differs depending on the distance from the camera to the object to be photographed. This is because this difference in distance is shown as a difference in distance from the camera to the object to be photographed. Therefore, as described with reference to FIG. 7, the difference in distance from the camera to the object to be photographed can be obtained depending on how many pixels the two adjacent pixels in the slice image appear in the frame image. Based on the cumulative expansion / contraction of the pixels, the correspondence relationship between the above-described section lengths between the x-axis and the t-axis can be obtained. Based on this maximum value of the accumulated frame length, it is determined how many times the value of the maximum value of the x axis is set to T. The degree of local pixel expansion / contraction is determined by Δx (t, y) = x (t, y) −x (t−1, y).

　図１０に示した動的計画法に基づくマッチングアルゴリズム（ＤＰマッチングアルゴリズム）は、局所パスの組み合わせにより各種考えることができるが、画像距離算出装置１００では、一例として次式で示されるものを用いることにする。これは局所的なピクセルの収縮、フレーム画像からスライス画像への伸縮が、１倍から４倍までを許すものとなっている。局所的伸縮が１倍から４倍までを許すので、大域的にも１倍から４倍を許すものとなっている。ＤＰマッチングの以下の式は、局所的変動が１倍から４倍までを許すものであることが表現されている。一般に、許される倍数の範囲は動的計画法における数式の中で任意に設定できる。 The matching algorithm (DP matching algorithm) based on the dynamic programming shown in FIG. 10 can be considered in various ways depending on the combination of local paths, but the image distance calculation apparatus 100 uses the one represented by the following equation as an example. To. This allows local contraction of the pixel and expansion / contraction from the frame image to the slice image from 1 to 4 times. Since local expansion / contraction allows 1 to 4 times, globally, 1 to 4 times is allowed. The following formula for DP matching is expressed as allowing local variation from 1 to 4 times. In general, the range of allowed multiples can be arbitrarily set in a mathematical expression in dynamic programming.

　まず、図１０に示した３次元空間の座標を（ｘ，ｙ，ｔ）で示すと、左側面のライン（フレーム画像）は、既に説明したように、ｙ′を固定し、ｆ（ｘ，ｙ′，ｔ０）、１≦ｘ≦Ｘの乱パターンを定め、この同じものを、ｙ軸に上に置いたものである。図１０に示すｒ（ｘ）を用いて、
　　　　ｆ（ｘ，ｙ′，ｔ０）＝ｒ（ｘ）と設定すると、側面の画像は、より正確には、ｒ（ｘ，ｙ），１≦ｘ≦Ｘ，１≦ｙ≦Ｙとなる。 First, when the coordinates of the three-dimensional space shown in FIG. 10 are indicated by (x, y, t), the line (frame image) on the left side has y ′ fixed and f (x, y ′, t0), a random pattern of 1 ≦ x ≦ X is defined, and the same pattern is placed on the y-axis. Using r (x) shown in FIG.
If f (x, y ′, t0) = r (x) is set, the side image is more precisely r (x, y), 1 ≦ x ≦ X, 1 ≦ y ≦ Y.

　なお、ｒ（１，ｙ′）＝ｇ（１，ｙ′）という制約条件が設定されている。 Note that a constraint condition r (1, y ′) = g (1, y ′) is set.

　また、スライス画像はｇ（ｔ，ｙ）で示す。さらに、ＤＰマッチングアルゴリズムにおいて求める局所距離をｄ（ｔ，ｘ，ｙ）とする。 The slice image is indicated by g (t, y). Further, a local distance obtained in the DP matching algorithm is d (t, x, y).

　局所距離は、ｄ（ｔ，ｘ，ｙ）＝｜ｇ（ｔ，ｙ）－ｒ（ｘ）｜で求められる。
さらに、ｙ＝ｙ′において、Ｄ（１，１，ｙ′）＝ｄ（１，１，ｙ′）と初期設定を行い、（１，１，ｙ′）を除く、全ての（ｔ，ｘ，ｙ）に対して、Ｄ（ｔ，ｘ，ｙ）＝∞として初期設定を行う。 The local distance is obtained by d (t, x, y) = | g (t, y) −r (x) |.
Further, at y = y ′, initialization is performed as D (1,1, y ′) = d (1,1, y ′), and all (t, x ′) except (1,1, y ′) are set. , Y), initialization is performed with D (t, x, y) = ∞.

　さらに、ｔ，ｘ，ｙの値は、
　　　　ｔ∈［１，Ｔ］，ｙ∈［ｍａｘ（１，ｙ′－ｔ），ｍｉｎ（ｙ′＋ｔ，Ｙ）］，
　　　　ｘ∈［ｔ，４ｔ－３］（＝［ｔ，４（ｔ－１）］）とする。
この条件に基づいて、Ｄ（ｔ，ｘ，ｙ）の値を、下記の式２を用いて求める。

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　・・・式２
　次に、ｙ＝ｙ′において、ｗ（１，１，ｙ′）＝１と初期設定を行い、（１，１，ｙ′）を除く、全ての（ｔ，ｘ，ｙ）に対して、ｘ（ｔ，ｘ，ｙ）＝０として初期設定を行う。この初期設定に基づいて、以下の式３を用いて、ｗ（ｔ，ｘ，ｙ）を求める。

　　　　　　　　　　　　　　　　　　　　　　　　・・・式３
　上述したＤ（ｔ，ｘ，ｙ）の式２は、局所距離の非線形のマッチングによる累積の式を示している。非線形の内容は、フレーム画像の線分が、スライス画像において、ｘ軸方向に、１倍から１／４倍の範囲で縮小し、ｙ軸方向へは、ｙ′から上方向に最大Ｔ画素（ピクセル）、下方向に最大Ｔ画素（ピクセル）、時間Ｔにおいて変動を許容するものである。このｙ軸への変動の許容は、カメラ２００が撮影対象物に対して完全に平行に動いていないことを想定したものである。 Furthermore, the values of t, x, y are
t∈ [1, T], y∈ [max (1, y′−t), min (y ′ + t, Y)],
Let x∈ [t, 4t−3] (= [t, 4 (t−1)]).
Based on this condition, the value of D (t, x, y) is obtained using Equation 2 below.

... Formula 2
Next, at y = y ′, w (1,1, y ′) = 1 is initialized, and for all (t, x, y) except (1,1, y ′), Initial setting is performed with x (t, x, y) = 0. Based on this initial setting, w (t, x, y) is obtained using the following Expression 3.

... Formula 3
Expression 2 of D (t, x, y) described above shows an accumulation expression by nonlinear matching of local distances. The nonlinear content is that the line segment of the frame image is reduced in the range of 1 to 1/4 times in the x-axis direction in the slice image, and the maximum T pixels (up from y ′ in the y-axis direction) Pixel), a maximum of T pixels (pixels) in the downward direction, and variation in time T is allowed. The permissible variation in the y-axis is based on the assumption that the camera 200 does not move completely parallel to the object to be photographed.

　局所距離の最適累積値は、図１０に示した左側面のｘ＝Ｔからｘ＝４Ｔまでの範囲で求められる。また、この最適累積値に至る処理において用いられる係数の和は、全てのｘ，ｙ，ｔにおいて計算されるが、上述したｗ（ｔ，ｘ，ｙ）が、係数の和に関する漸化式である。このｗ（ｔ，ｘ，ｙ）の時間ｔの終端であるｗ（Ｔ，ｘ，ｙ）は、累積値Ｄ（Ｔ，ｘ，ｙ）の正規化のために用いられる。ここで、正規化とは、累積値に至るパスの長さの違いを正規化することを意味している。 The optimum cumulative value of the local distance is obtained in the range from x = T to x = 4T on the left side surface shown in FIG. In addition, the sum of coefficients used in the process up to the optimum cumulative value is calculated for all x, y, and t. The above-described w (t, x, y) is a recurrence formula for the sum of coefficients. is there. W (T, x, y), which is the end of time t of w (t, x, y), is used for normalization of the accumulated value D (T, x, y). Here, normalization means normalization of the difference in path length that reaches the cumulative value.

　以上の計算を（ｘ，ｙ，ｔ）の３次元空間において行った（図１０に示した直方体中で終了した）後に、ＣＰＵ１０４では、以下に示す式４に基づいて、スポッティング点（Ｔ，ｘ^＊ _Ｔ，ｙ^＊ _Ｔ）を計算する（図２のＳ．３）。スポッティング点は、図７において説明したように、スライス画像の最後の時間Ｔの画素に対応する、フレーム画像の画素を表している。しかしながら、スライス画像の（ｔ，ｙ）＝（１，ｙ′）からｔ＝Ｔに至るマッチングの対応線（対応するラインｒ（ｘ））の終端が、フレーム画像の所定のｙ′におけるｘ軸方向のどの画素（画素列のどの画素）に一致するか、事前に分かっていない。このため、その一致点を決定する（スポッティングする）ためにスポッティング点の計算が行われることになる。スポッティング点の計算は、次の式４で示される。

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　・・・式４
　スポッティング点の計算式（式４）における「arg」は、minにする変数を取り出す関数を示している。 After the above calculation is performed in the three-dimensional space of (x, y, t) (finished in the rectangular parallelepiped shown in FIG. 10), the CPU 104 calculates the spotting point (T, x based on the following equation 4: ^* _T , y ^* _T ) is calculated (S.3 in FIG. 2). As described with reference to FIG. 7, the spotting points represent the pixels of the frame image corresponding to the pixels of the last time T of the slice image. However, the end of the matching corresponding line (corresponding line r (x)) from (t, y) = (1, y ′) to t = T of the slice image is the x axis at a predetermined y ′ of the frame image. It is not known in advance which pixel in the direction (which pixel in the pixel row) matches. For this reason, spotting points are calculated in order to determine (spot) the matching points. The calculation of the spotting point is expressed by the following Equation 4.

... Formula 4
“Arg” in the calculation formula (expression 4) of the spotting point indicates a function for extracting a variable to be min.

　スポッティング点（Ｔ，ｘ^＊ _Ｔ，ｙ^＊ _Ｔ）が計算された後に、画像距離算出装置１００のＣＰＵ１０４では、スポッティング点に至る（ｔ，ｙ）＝（１，ｙ′）からの軌跡（trajectory）を、バックトレース処理によって求める（図２のＳ．４）。 After the spotting point (T, x ^* _T , y ^* _T ) is calculated, the CPU 104 of the image distance calculation apparatus 100 has a trajectory from (t, y) = (1, y ′) to the spotting point. Is obtained by backtrace processing (S.4 in FIG. 2).

　バックトレース処理とは、バックトレースの最後の点を（１，１，ｙ′）として、ｔ＝Ｔ，Ｔ－１，Ｔ－２，・・・，１と、ｔの値をＴから１ずつ低減させることによって、スポッティング点（Ｔ，ｘ^＊ _Ｔ，ｙ^＊ _Ｔ）から（１，１，ｙ′）までの軌跡を求める処理である。バックトレース処理は、以下の式５に基づいて行われる。

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　・・・式５
　バックトレース処理によって、フレーム画像の所定のｙ′におけるｘ軸方向の画素列（ライン）のどの画素が、スライス画像における時間ｔの画素に対応するかを算出することができる。ここで、説明の便宜上、フレーム画像における時間Ｔの対応点（スポッティング点）を、ｘ（Ｔ，ｙ′）と記載する。このスポッティング点は、所定のｙ′によって異なった画素位置になる。 The back trace process is defined as t = T, T-1, T-2,. In this process, the trajectory from the spotting point (T, x ^* _T , y ^* _T ) to (1, 1, y ′) is obtained by reducing the distance. The backtrace process is performed based on the following Expression 5.

... Formula 5
Through the backtrace process, it is possible to calculate which pixel in the pixel column (line) in the x-axis direction at a predetermined y ′ of the frame image corresponds to the pixel at time t in the slice image. Here, for convenience of explanation, the corresponding point (spotting point) of time T in the frame image is described as x (T, y ′). This spotting point becomes a different pixel position depending on a predetermined y ′.

　さらに、ｙ′におけるフレーム画像のスポッティング点を、ｙ′を省略してｘ（１），ｘ（２），・・・，ｘ（Ｔ）と示すと、フレーム画像における時間ｔのスポッティング点の位置変化を、Δｘ（ｔ）と示すことができる。カメラ２００から撮影対象物までの時間ｔにおけるなす角Δθ（ｔ）とする。Δθ（ｔ）の単位はラジアンである。このなす角Δθ（ｔ）と、上述した時間ｔにおけるスポッティング点の位置変化Δｘ（ｔ）とを比較すると、フレーム画像におけるｘ方向のΔθ_ｘ（ｔ）と、スポッティング点の位置変化Δｘ（ｔ）とは、次の式６の関係を有するものと判断できる。

　　　　　　　　　　　　　　　　　　　　　　　　・・・式６
　つまり、Δｘ（ｔ）は、時間ｔにおける動的視差（モーションパララックス）に等しいものととらえることができる。従って、累積された動的視差は、次の式７で示すことができる。
ここで、重要なことは、スポッティング点ｘ（Ｔ）が先に求まり、その後に、バックトレース処理によって、ｘ（ｔ），ｔ＝１，・・・．Ｔ－１が求まることである。従って、上記の式６および次式の式７の関係式は事後的に成立するものであるといえる。

　　　　　　　　　　　　　　　　　　　　　　　　・・・式７
　但し、ｘ（０）＝０である。ｘ（Ｔ）は、フレーム画像の所定のｙ′におけるｘ軸方向の画素列（ライン）において、時間Ｔまで累積された動的視差の値を示すことになる。また、ｘ_Ｔ／Ｔは、累積された動的視差の標準化された値に該当する。本実施の形態に係る画像距離算出装置１００では、上述した動的視差の累積を使うことによって、フレーム画像における各画素の距離を算出することが可能になる。 Further, if the spotting point of the frame image at y ′ is represented by x (1), x (2),..., X (T) without y ′, the position of the spotting point at time t in the frame image The change can be denoted as Δx (t). An angle Δθ (t) formed at time t from the camera 200 to the object to be photographed is assumed. The unit of Δθ (t) is radians. When this angle Δθ (t) is compared with the position change Δx (t) of the spotting point at time t described above, Δθ _x (t) in the x direction in the frame image and the position change Δx (t) of the spotting point. Can be determined to have the relationship of the following formula 6.

... Formula 6
That is, Δx (t) can be regarded as being equal to dynamic parallax (motion parallax) at time t. Therefore, the accumulated dynamic parallax can be expressed by the following Expression 7.
Here, it is important that the spotting point x (T) is obtained first, and then x (t), t = 1,. T-1 is obtained. Therefore, it can be said that the relational expression of the above expression 6 and the following expression 7 is established after the fact.

... Formula 7
However, x (0) = 0. x (T) represents the value of dynamic parallax accumulated up to time T in a pixel column (line) in the x-axis direction at a predetermined y ′ of the frame image. X _T / T corresponds to a standardized value of accumulated dynamic parallax. The image distance calculation apparatus 100 according to the present embodiment can calculate the distance of each pixel in the frame image by using the above-described accumulation of dynamic parallax.

　次に、フレーム画像における各画素の距離の求め方について説明する。 Next, how to obtain the distance of each pixel in the frame image will be described.

　フレーム画像とスライス画像との間には、フレーム画像からスライス画像への非線形となる縮小（画素間距離の圧縮、画像の圧縮）が存在している。この縮小（画素間距離の圧縮）の程度を距離に変換することによって、フレーム画像の各画素（各点）において、カメラ２００から撮影対象物までの距離を算出することが可能になる。ここで、フレーム画像とスライス画像との対応関係を考える場合、２つの画像の間には、オクルージョン（遮蔽）の部分が存在している場合もある。オクルージョンとは、３次元空間において、手前にある物体が背後にある物体を隠して見えないようにする状態を意味する。つまり、移動するカメラによって撮影された撮影対象物が、手前にある物体によって、一時的に隠れてしまうことにより、フレーム画像とスライス画像との１対１の対応関係が成立しない状態を意味する。しかしながら、オクルージョンが発生する部分は一部であると共に、前後関係から各画素の距離は類似した部分となることが多い。このため、２つの画像における対応問題を考えて、画素を距離に変換するパラメータを抽出する。すなわち、フレーム画像とスライス画像との対応関係が成立すれば、フレーム画像のそれぞれの画素について、フレーム画像の画素に写っている撮影対象物からカメラ２００までの距離を求めることが可能になる。 Between the frame image and the slice image, there is non-linear reduction from the frame image to the slice image (inter-pixel distance compression, image compression). By converting the degree of this reduction (compression of the distance between pixels) into a distance, it is possible to calculate the distance from the camera 200 to the object to be photographed at each pixel (each point) of the frame image. Here, when considering the correspondence between the frame image and the slice image, there may be an occlusion (occlusion) portion between the two images. Occlusion means a state in which an object in the foreground hides an object behind in a three-dimensional space so that it cannot be seen. That is, it means a state in which the one-to-one correspondence between the frame image and the slice image is not established because the object to be photographed by the moving camera is temporarily hidden by an object in front. However, the part where the occlusion occurs is a part, and the distance between the pixels is often a similar part from the context. For this reason, considering the correspondence problem between the two images, a parameter for converting a pixel into a distance is extracted. That is, if the correspondence relationship between the frame image and the slice image is established, it is possible to obtain the distance from the subject to be photographed in the pixel of the frame image to the camera 200 for each pixel of the frame image.

　本実施の形態では、フレーム画像とスライス画像との対応関係を求めるために、２つの段階に分けて対応関係を求める。１つ目の対応関係は、複数の画素からなる「領域」の対応関係である。そして、２つ目の対応関係は、それぞれの画素毎（ピクセル毎）の対応関係である。２つの段階に分ける理由として、１つ目については、シーンのカメラからの距離は領域単位でほぼ類似していることと、領域の対応を行う方が最初から画素単位の対応を行うより容易であることである。２つ目については、１つ目の結果をもとに、より詳細な対応を行うことができるためである。それぞれの段階で画素について距離が求められる。１つ目の段階では、領域の画素は全て同じ距離となる。最終的には２つの段階の結果を統合する。 In this embodiment, in order to obtain the correspondence between the frame image and the slice image, the correspondence is obtained in two stages. The first correspondence relationship is a correspondence relationship of “regions” composed of a plurality of pixels. The second correspondence relationship is a correspondence relationship for each pixel (for each pixel). The reason for dividing into two stages is that the distance from the camera of the scene is almost similar in units of regions, and it is easier to perform the correspondence of the regions than the correspondence of the pixel units from the beginning. That is. The second is because more detailed correspondence can be performed based on the first result. At each stage, a distance is determined for the pixel. In the first stage, all the pixels in the region are the same distance. Eventually, the results of the two stages are integrated.

　本実施の形態に係る画像距離算出装置１００で用いたライン対画像のＤＰマッチング処理は、原理的に画素（ピクセル）毎の対応を求めるものである。しかしながら、フレーム画像とスライス画像との関係には、オクルージョンの問題があり、さらに、ライン対画像のＤＰマッチング処理には、非線形性の存在があることから、画素（ピクセル）の対応を、完全かつ正確に行うことが難しいという問題がある。そのため、領域の対応関係によって距離の値を領域単位で決定すること（領域分割処理）を、１つ目の段階の最初の処理として考える。既存の領域分割手法のうち最も有力な手法の一つとして、mean-shift法（中間値シフト法）と呼ばれる方法が知られている。mean-shift法は、広く知られた領域分割手法であって、Open CV（Open Source Computer Vision Library）と呼ばれる、広く公開されたオープンソースのコンピュータビジョン向けのライブラリによって提供されている。このため、誰でもmean-shift法を利用することができる。 The line-to-image DP matching process used in the image distance calculation apparatus 100 according to the present embodiment is to obtain the correspondence for each pixel in principle. However, there is an occlusion problem in the relationship between the frame image and the slice image, and further, since there is a non-linearity in the DP matching processing of the line-to-image, the correspondence between the pixels (pixels) is completely and There is a problem that it is difficult to do accurately. Therefore, it is considered that the distance value is determined for each region based on the correspondence relationship between regions (region division processing) as the first processing in the first stage. As one of the most prominent methods among existing region segmentation methods, a method called a mean-shift method (intermediate value shift method) is known. The mean-shift method is a well-known region segmentation method and is provided by a widely open library for open source computer vision called Open CV (Open Source Computer Vision Library). For this reason, anyone can use the mean-shift method.

　画像距離算出装置１００のＣＰＵ１０４は、フレーム画像とスライス画像とに、mean-shift法（領域分割処理）を適用する（図２のＳ．５）。このとき、ＣＰＵ１０４では、共通したパラメータ（共通した分割基準）を用いて、領域分割処理を行う。適用するパラメータが異なる場合には、対応する分割領域を求めることが難しくなるためである。 The CPU 104 of the image distance calculation apparatus 100 applies the mean-shift method (region division processing) to the frame image and the slice image (S.5 in FIG. 2). At this time, the CPU 104 performs region division processing using common parameters (common division criteria). This is because when the applied parameters are different, it is difficult to obtain a corresponding divided region.

　図１１（ａ）（ｂ）は、フレーム画像とスライス画像とに、mean-shift法を適用した後の画像を示している。図１１（ａ）（ｂ）に示したフレーム画像およびスライス画像と、図５（ａ）（ｂ）に示したフレーム画像およびスライス画像とを比較すれば明らかなように、図１１（ａ）（ｂ）に示したmean-shift法が適用された（領域分割処理後の）フレーム画像およびスライス画像では、同じ領域と判断された部分が共通した色で塗りつぶされている。この色の違いによって、同じ領域および異なる領域の判断を行うことが可能になる。 FIGS. 11A and 11B show images after applying the mean-shift method to the frame image and the slice image. As apparent from a comparison between the frame image and slice image shown in FIGS. 11A and 11B and the frame image and slice image shown in FIGS. 5A and 5B, FIG. In the frame image and the slice image to which the mean-shift method shown in b) is applied (after the region division process), portions determined to be the same region are filled with a common color. This difference in color makes it possible to determine the same region and different regions.

　mean-shift法の適用により、同じ領域と判断される部分は、ほぼ同じ距離（カメラ２００から撮影対象物までの距離）を有するものであると考えることができる。また、mean-shift法が適用されたフレーム画像とスライス画像とを比較すると、２つの画像には非線形性が含まれているが、分割された領域の作られ方は、類似していると考えることができる。従って、画像距離算出装置１００のＣＰＵ１０４では、mean-shift法によって領域分割されたフレーム画像とスライス画像とに基づいて、ライン対画像のＤＰマッチング処理およびバックトレー処理による画素対応の結果を用いて、２つの画像の領域対応を求める。 It can be considered that portions determined to be the same region by applying the mean-shift method have substantially the same distance (distance from the camera 200 to the object to be photographed). In addition, comparing the frame image to which the mean-shift method is applied and the slice image, the two images contain non-linearity, but the way the divided areas are created is considered similar be able to. Therefore, the CPU 104 of the image distance calculation apparatus 100 uses the pixel correspondence result by the DP matching processing and the back tray processing of the line pair image based on the frame image and the slice image divided by the mean-shift method, The correspondence between the areas of the two images is obtained.

　図１２は、スライス画像とフレーム画像との領域対応を説明するための模式図を示したものである。ライン対画像のＤＰマッチング処理およびバックトレース処理により、スライス画像の各画素と、フレーム画像の各画素との対応関係が求められている。従って、ＣＰＵ１０４では、図１２に示すように、mean-shift法により分割されたスライス画像の領域に位置する画素（ピクセル）と、同じくmean-shift法により分割されたフレーム画像の領域に位置する画素（ピクセル）とを比較する。そして、ＣＰＵ１０４は、対応する画素（ピクセル）の数が最も多い領域が、互いに対応する領域（対応領域）であると判断する（図２のＳ．６：対応領域決定処理）。 FIG. 12 is a schematic diagram for explaining the area correspondence between the slice image and the frame image. The correspondence between each pixel of the slice image and each pixel of the frame image is obtained by DP matching processing and backtrace processing of the line pair image. Therefore, in the CPU 104, as shown in FIG. 12, the pixels (pixels) located in the slice image area divided by the mean-shift method and the pixels located in the frame image area divided by the mean-shift method as well. Compare (pixel). Then, the CPU 104 determines that the region having the largest number of corresponding pixels (pixels) is a region (corresponding region) corresponding to each other (S.6 in FIG. 2: corresponding region determining process).

　つまり、図１３（ａ）に模式的に示す例のように、スライス画像の領域Ａ１対応するフレーム画像の領域を求める場合に、ＣＰＵ１０４は、スライス画像の領域Ａ１に存在する４つの画素（黒丸）に対応するフレーム画像の画素（黒丸）を求めて、対応する画素（黒丸）が最も多く含まれるフレーム画像の領域を求める。図１３（ａ）では、領域Ａ１の画素（黒丸）に対応する画素が最も多く含まれるフレーム画像の領域は領域Ａ２であるため、ＣＰＵ１０４は、スライス画像の領域（分割領域）Ａ１に対応するフレーム画像の対応領域は、領域（分割領域）Ａ２であると判断する。同様にして、ＣＰＵ１０４は、スライス画像の領域Ｂ１の画素（黒丸）に対応するフレーム画像の画素（黒丸）が最も多く含まれる領域Ｂ２を、対応する領域（対応領域）と判断し、スライス画像の領域Ｃ１の画素（黒丸）に対応するフレーム画像の画素（黒丸）が最も多く含まれる領域Ｃ２を、対応する領域（対応領域）と判断する。 That is, as in the example schematically illustrated in FIG. 13A, when obtaining the frame image area corresponding to the slice image area A1, the CPU 104 determines four pixels (black circles) present in the slice image area A1. The pixel (black circle) of the frame image corresponding to is obtained, and the region of the frame image containing the most corresponding pixels (black circle) is obtained. In FIG. 13A, since the region of the frame image that includes the most pixels corresponding to the pixels (black circles) in the region A1 is the region A2, the CPU 104 determines the frame corresponding to the region (divided region) A1 of the slice image. It is determined that the corresponding area of the image is the area (divided area) A2. Similarly, the CPU 104 determines that the region B2 including the largest number of pixels (black circles) in the frame image corresponding to the pixels (black circles) in the slice image region B1 is the corresponding region (corresponding region), and A region C2 that includes the most pixels (black circles) in the frame image corresponding to the pixels (black circles) in the region C1 is determined as a corresponding region (corresponding region).

　次に、ＣＰＵ１０４は、フレーム画像の各領域の各画素に付加される距離の値の算出を行う。この距離の算出には、前述したように、２つの段階に分けて計算が行われる。まず１つ目は、mean-shift法によって領域分割された領域毎の距離の値の算出である（図２のＳ．７）。この距離の値をグローバル距離（領域外距離）と称する。２つ目は、各領域内の画素（ピクセル）毎の距離の値の算出である（図２のＳ．８）。この距離の値をローカル距離（領域内距離）と称する。 Next, the CPU 104 calculates the value of the distance added to each pixel in each area of the frame image. This distance is calculated in two stages as described above. The first is calculation of a distance value for each region divided by the mean-shift method (S.7 in FIG. 2). This distance value is referred to as a global distance (out-of-region distance). The second is the calculation of the distance value for each pixel (pixel) in each region (S.8 in FIG. 2). This distance value is referred to as a local distance (intra-region distance).

　まず、グローバル距離の算出を行う。mean-shift法によって分割されたフレーム画像の領域の大きさと、スライス画像の領域の大きさとの違いは、カメラ２００から撮影対象物までの距離に関係する。カメラ２００から撮影対象物までの距離が遠い場合には、フレーム画像の領域の大きさに比べて、スライス画像の領域はある程度の大きさを維持しており、領域の大きさを基準とした圧縮率は小さい傾向がある。一方で、カメラ２００から撮影対象物までの距離が近い場合には、フレーム画像の領域の大きさに比べて、スライス画像の領域の大きさが比較的小さくなり、領域の大きさを基準とした圧縮率が大きくなる傾向がある。従って、スライス画像の対応する領域の横軸の長さの平均値と、フレーム画像の対応する領域の横軸の長さの平均値との比率に基づいて、対応する領域の圧縮率を求める。なお、領域の横軸の長さの平均値ではなく、最頻度の長さを求めて比率を算出することにより、圧縮率を求めることも可能である。 First, the global distance is calculated. The difference between the size of the region of the frame image divided by the mean-shift method and the size of the region of the slice image is related to the distance from the camera 200 to the photographing object. When the distance from the camera 200 to the object to be photographed is long, the slice image area maintains a certain size compared to the size of the frame image area, and the compression is based on the area size. The rate tends to be small. On the other hand, when the distance from the camera 200 to the object to be photographed is short, the size of the slice image area is relatively smaller than the size of the frame image area, and the size of the area is used as a reference. The compression rate tends to increase. Accordingly, the compression ratio of the corresponding region is obtained based on the ratio between the average value of the horizontal axis of the corresponding region of the slice image and the average value of the horizontal axis of the corresponding region of the frame image. Note that it is also possible to obtain the compression rate by calculating the ratio by obtaining the most frequent length instead of the average value of the length of the horizontal axis of the region.

　例えば、図１２に示すように、フレーム画像の１つの領域（領域Ａ２）内の横線分を観察して、領域内の線分の終点に近いところに、スライス画像の時間ｔ２に対応するｘ（ｔ２）が存在し、始点に近いところに時間ｔ１に対応するｘ（ｔ１）が存在する場合には、ｘ（ｔ２）－ｘ（ｔ１）が、その区間における累積された動的視差の差分を示すことになる。一方で、スライス画像の対応する領域（領域Ａ１）の対応する線分の長さは、ｔ２－ｔ１である。 For example, as shown in FIG. 12, the horizontal line segment in one area (area A2) of the frame image is observed, and x (corresponding to the time t2 of the slice image is located near the end point of the line segment in the area. When t2) exists and x (t1) corresponding to time t1 exists near the start point, x (t2) −x (t1) represents the accumulated dynamic parallax difference in the section. Will show. On the other hand, the length of the corresponding line segment of the corresponding area (area A1) of the slice image is t2-t1.

　スライス画像の対応領域の横軸方向の平均長をｐとし、フレーム画像の対応する領域の横軸方向の平均長をｑとする。このようにｐとｑとを設定すると、スライス画像に対するフレーム画像の拡大率は、ｑ／ｐで示すことができる。また、本実施の形態に係る画像距離算出装置１００のライン対画像のＤＰマッチング処理では、図１０に示したように、フレーム画像のｘ軸の値が、時間ｔの４倍の値で対応付けられている（ｘ＝４Ｔ）。このため、ｑ／ｐは、１≦ｑ／ｐ≦４となる。現実の世界におけるカメラ２００から撮影対象物までの距離と、ｑ／ｐの値との対応関係を示すデータを、予め用意することができれば、ｑ／ｐの値（比率ｑ／ｐの値）から、フレーム画像における分割領域（対応領域）の距離を求めることができる。このｑ／ｐの値と、カメラ２００から撮影対象物までの現実の距離との対応関係を示すデータの一例を、図１３（ｂ）に示す。 The average length of the corresponding area of the slice image in the horizontal axis direction is p, and the average length of the corresponding area of the frame image in the horizontal axis direction is q. When p and q are set in this way, the enlargement ratio of the frame image with respect to the slice image can be represented by q / p. Further, in the line-to-image DP matching processing of the image distance calculation apparatus 100 according to the present embodiment, the x-axis value of the frame image is associated with a value four times the time t as shown in FIG. (X = 4T). For this reason, q / p is 1 ≦ q / p ≦ 4. If data indicating the correspondence between the distance from the camera 200 to the object to be photographed in the real world and the q / p value can be prepared in advance, the q / p value (ratio q / p value) The distance between the divided areas (corresponding areas) in the frame image can be obtained. FIG. 13B shows an example of data indicating the correspondence between the q / p value and the actual distance from the camera 200 to the object to be photographed.

　また，領域ｒにおいて定まるｐおよびｑの使い方として，ｐに対するｑの割合を示すｑ／ｐを比率値として求めるだけでなく、ｑに対するｐの割合を示すα_ｒ＝ｐ／ｑを比率値として求めて使用してもよい。図１３（ｂ）においては、横軸をα_ｒ（＝ｐ／ｑ）で示し、縦軸を距離ｚで示している。 Further, as the usage of p and q determined in the region r, not only q / p indicating the ratio of q to p is determined as a ratio value, but also α _r = p / q indicating the ratio of p to q is determined as a ratio value. May be used. In FIG. 13B, the horizontal axis is indicated by α _r (= p / q), and the vertical axis is indicated by the distance z.

　図１４は、ｐ，ｑと距離ｚとの関係式を用いて、分割領域（対応領域）毎のグローバル距離を一例として示した画像である。図１４に示す画像では、mean-shift法によって分割された領域を基準として、グローバル距離が近いものほど、明るい色で表示されており、グローバル距離が遠いものほど、暗い色で示されている。従って、ユーザは、分割領域の色彩に対応するグローバル距離を確認することによって、カメラ２００から撮影対象物までの距離を、分割領域（対応領域）毎に判断すること可能になる。 FIG. 14 is an image showing, as an example, the global distance for each divided region (corresponding region) using a relational expression between p, q and the distance z. In the image shown in FIG. 14, with a region divided by the mean-shift method as a reference, the closer the global distance is, the brighter the color is displayed, and the farther the global distance is, the darker the color is. Therefore, the user can determine the distance from the camera 200 to the photographing object for each divided region (corresponding region) by confirming the global distance corresponding to the color of the divided region.

　次に、ローカル距離の算出について説明する。グローバル距離の算出によって、分割領域（対応領域）毎の距離は算出することができる。しかしながら、分割領域（対応領域）内の画素毎の詳細な距離を求めるためには、さらに処理を行う必要がある。このように、分割領域（対応領域）内の画素毎の詳細な距離を、分割領域（対応領域）内における相対的な距離として求めるために、ＣＰＵ１０４では、ローカル距離の算出処理を行う。 Next, calculation of local distance will be described. By calculating the global distance, the distance for each divided region (corresponding region) can be calculated. However, in order to obtain a detailed distance for each pixel in the divided area (corresponding area), it is necessary to perform further processing. Thus, in order to obtain the detailed distance for each pixel in the divided area (corresponding area) as a relative distance in the divided area (corresponding area), the CPU 104 performs a local distance calculation process.

　ここで、フレーム画像とスライス画像との対応する分割領域の線分をそれぞれ考える。それぞれの分割領域の線分では、線分の始点と終点とが既に定まっている。これは、mean-shift法によって領域分割されたスライス画像の分割領域と、フレーム画像の対応領域（分割領域）との対応関係が既に明確になっているため、それぞれの分割領域の対応する縁を明確に求めることができるためである。従って、対応する分割領域の線分の始点から終点まで（一端の縁から他端の縁まで）のそれぞれの画素の対応関係（対応する画素）を、従来より用いられている両端固定のＤＰマッチング処理およびバックトレース処理によって求めることができる。 Here, the line segments of the corresponding divided areas of the frame image and the slice image are considered. In each segmented segment, the starting point and ending point of the segment are already determined. This is because the corresponding relationship between the divided area of the slice image divided by the mean-shift method and the corresponding area (divided area) of the frame image has already been clarified. This is because it can be clearly determined. Accordingly, the correspondence relationship (corresponding pixel) of each pixel from the start point to the end point (from the edge of one end to the edge of the other end) of the segment of the corresponding divided region is used for DP matching that is conventionally used for both ends. It can be determined by processing and backtrace processing.

　例えば、スライス画像の対応する分割領域の線分を、ａ（ｉ）、但しｉ＝１，２，・・・Ｉとし、フレーム画像の対応する対応領域（分割領域）の線分を、ｂ（ｊ）、但しｊ＝１，２，・・・Ｊとする。局所距離ｄ（ｉ，ｊ）をｄ（ｉ，ｊ）＝｜ａ（ｉ）－ｂ（ｊ）｜とすると、ＤＰマッチング処理を行うことによって、次の式８により、Ｄ（Ｉ，Ｊ）が求められる。

　　　　　　　　　　　　　　　　　　　　　　　　・・・式８
　この式８に基づいてＤ（Ｉ，Ｊ）を求めた後に、（Ｉ，Ｊ）から（１，１）まで、バックトレース処理を行うことによって、スライス画像の分割領域とフレーム画像の対応領域とにおける、２つの線分の要素の対応関係を求めることができる。 For example, the line segment of the corresponding divided area of the slice image is a (i), where i = 1, 2,... I, and the line segment of the corresponding corresponding area (divided area) of the frame image is b ( j) where j = 1, 2,... Assuming that the local distance d (i, j) is d (i, j) = | a (i) −b (j) |, by performing DP matching processing, D (I, J) Is required.

... Formula 8
After obtaining D (I, J) based on this equation 8, by performing backtrace processing from (I, J) to (1, 1), the slice image divided region and the frame image corresponding region The correspondence between two line segment elements can be obtained.

　この場合、ｊ軸の対応系列を、ａ^＊（１），ａ^＊（２），ａ^＊（３），・・・ａ^＊（Ｉ）とするとき、ａ^＊（ｊ）－ａ^＊（ｊ－１）が局所的な動的視差（motion parallax）を示すものになる。この局所的な動的視差は、画素単位（ピクセル単位）の動的視差であって、この局所的な動的視差によって、対応する領域内における画素単位（ピクセル単位）の距離を求めることが可能となる。つまり、図７において説明したように、動的視差の違いに伴って、フレーム画像の対応領域における隣り合う画素の間隔が広くなったり狭くなったりする。 In this case, when the corresponding series of the j-axis is a ^* (1), a ^* (2), a ^* (3),... A ^* (I), a ^* (j) −a ^* (j -1) indicates local dynamic parallax. This local dynamic parallax is a dynamic parallax in pixel units (pixel units), and the local dynamic parallax can determine the distance in pixel units (pixel units) in the corresponding region. It becomes. That is, as described in FIG. 7, with the difference in dynamic parallax, the interval between adjacent pixels in the corresponding region of the frame image becomes wider or narrower.

　具体的には、隣り合う画素の間隔が狭い場合には、カメラ２００から撮影対象物までの距離が遠いことを示しており、隣り合う画素の間隔が広い場合には、カメラ２００から撮影対象物までの距離が近いことを示している。このため、フレーム画像の対応領域（分割領域）における隣り合う画素の間隔（画素間距離）によって、対応領域（分割領域）における相対的な距離の違いを判断することが可能になる。 Specifically, when the interval between adjacent pixels is narrow, it indicates that the distance from the camera 200 to the object to be photographed is long, and when the distance between adjacent pixels is wide, the object from the camera 200 to the object to be photographed. The distance to is close. For this reason, it is possible to determine the difference in the relative distance in the corresponding region (divided region) based on the interval between adjacent pixels (inter-pixel distance) in the corresponding region (divided region) of the frame image.

　以上の説明により求められたグローバル距離とローカル距離とに基づいて、フレーム画像の画素毎に、カメラ２００から対応する画素に写っている撮影対象物までの距離を求めることができる。具体的に、ＣＰＵ１０４では、対応する画素が含まれる対応領域（分割領域）のグローバル距離に対して、当該対応領域（分割領域）で求められたローカル距離を加算することによって、フレーム画像の画素毎に、詳細な距離を算出する（図２のＳ．９）。 Based on the global distance and the local distance obtained from the above description, the distance from the camera 200 to the object to be photographed in the corresponding pixel can be obtained for each pixel of the frame image. Specifically, the CPU 104 adds the local distance obtained in the corresponding region (divided region) to the global distance of the corresponding region (divided region) in which the corresponding pixel is included, for each pixel of the frame image. Then, a detailed distance is calculated (S.9 in FIG. 2).

　また、実際にフレーム画像の画素毎に、カメラから撮影対象物までの距離を算出する場合には、上述したオクルージョン（遮蔽）に対する対応を行うことが好ましい。本実施の形態では、時間ｔが１から１７５までの範囲で撮影された映像に基づいて、スライス画像が生成されている。つまり、スライス画像は、時間ｔが１から１７５までの１７５枚のフレーム画像に基づいて生成されていることになる。このため、フレーム画像に写っている撮影対象物が、スライス画像では写っていなかったり、反対にフレーム画像に写っていない撮影対象物が、スライス画像に写っている場合が生じ得る。このようなオクルージョンの発生は、スライス画像を生成する動画映像の時間が長くなればなるほど、頻繁に発生するおそれがある。オクルージョンが発生すると、フレーム画像における対応領域と、スライス画像における分割領域との対応関係の精度が劣化する可能性がある。 In addition, when actually calculating the distance from the camera to the object to be photographed for each pixel of the frame image, it is preferable to deal with the above-described occlusion (shielding). In the present embodiment, the slice image is generated based on the video shot in the time t range from 1 to 175. That is, the slice image is generated based on 175 frame images whose time t is 1 to 175. For this reason, there may occur a case in which a shooting target object that is reflected in the frame image is not reflected in the slice image, or a shooting target object that is not reflected in the frame image is reflected in the slice image. The occurrence of such occlusion may occur more frequently as the time of a moving image for generating a slice image becomes longer. When the occlusion occurs, there is a possibility that the accuracy of the correspondence between the corresponding area in the frame image and the divided area in the slice image is deteriorated.

　図１５（ａ）～（ｈ）は、フレーム画像においてｘ軸上の座標ｘ^Ｓ _０（Ｓ＝１，２，３，・・・）を順次決定して、複数枚のスライス画像を用いて順番にフレーム画像の画素毎の距離データ（スライス画像に対応するフレーム画像の画素）を算出した場合を示した図である。動的計画法によるマッチング処理によって、最初に計算されたフレーム画像のスポッティング点をｘ（Ｔ，ｙ）（このスポッティング点ｘ（Ｔ，ｙ）は、スライス画像の画素（Ｔ，ｙ）に対応する）とする。このスポッティング点ｘ（Ｔ，ｙ）のｙ軸の点列（ｘ（Ｔ，１），ｘ（Ｔ，２），ｘ（Ｔ，３），・・・，ｘ（Ｔ，ｙ），・・・，ｘ（Ｔ，Ｙ））を、メディアンフィルタ（median filter）を用いて平滑化する。この後、次のスライス画像における区間［１，Ｔ］と、このスライス画像の区間に対応する次のフレーム画像の区間［ｘ_０，ｘ_０＋ｘ（Ｔ，ｙ）］との間で、固定区間における動的計画法のマッチング処理を行い、区間内におけるフレーム画像の対応点を計算する。この処理を繰り返すことによって、複数枚のスライス画像を用いて、順番にフレーム画像の対応する画素毎の距離データを算出する。なお、メディアンフィルタで平滑化した後に求められるフレーム画像のスポッティング点の最小値が、次の処理におけるフレーム画像の区間の開始値ｘ_０となる。図１５（ａ）～（ｈ）では、繰り返し処理が行われた回数をＳとして、フレーム画像におけるｘ軸上の座標ｘ^Ｓ _０（Ｓ＝１，２，３，・・・）が示されている。図１５（ａ）～（ｈ）には、スライス画像に基づいて、距離の算出が行われたフレーム画像の範囲が、少しずつ広がる状態が示される。 FIGS. 15A to 15H sequentially determine the coordinates x ^S ₀ (S = 1, 2, 3,...) On the x-axis in the frame image, and sequentially use a plurality of slice images. FIG. 6 is a diagram illustrating a case where distance data for each pixel of a frame image (a pixel of a frame image corresponding to a slice image) is calculated. The spotting point of the frame image calculated first by the matching process by the dynamic programming is x (T, y) (this spotting point x (T, y) corresponds to the pixel (T, y) of the slice image. ). Y-axis point sequence (x (T, 1), x (T, 2), x (T, 3),..., X (T, y),. .., X (T, Y)) is smoothed using a median filter. Thereafter, a fixed interval between the interval [1, T] in the next slice image and the interval [x ₀ , x ₀ + x (T, y)] of the next frame image corresponding to this slice image interval The dynamic programming matching processing in is performed, and the corresponding points of the frame image in the section are calculated. By repeating this process, distance data for each corresponding pixel of the frame image is calculated in order using a plurality of slice images. The minimum value of the spotting point frame image obtained after smoothing median filter, a start value x ₀ of the frame image in the next processing section. 15 (a) to 15 (h), the number of times the repetitive processing is performed is S, and the coordinates x ^S ₀ (S = 1, 2, 3,...) On the x-axis in the frame image are shown. Yes. FIGS. 15A to 15H show a state in which the range of the frame image in which the distance is calculated is gradually expanded based on the slice image.

　また、図１６（ａ）は、複数枚の画像に基づいてモザイキング処理が行われた画像が示されており、図１６（ｂ）は、図１６（ａ）に示した画像に基づいて、領域毎にグローバル距離が算出された状態の画像に、フレーム画像の各ピクセルがもつＲ，Ｇ，Ｂの値を付加し、かつ、このデータを正面からではなく、斜め方向の視点を基準として示してある。図１６（ｂ）に示すように、３次元の距離情報が、領域毎に抽出されている。 FIG. 16A shows an image on which mosaicing processing has been performed based on a plurality of images. FIG. 16B shows an area based on the image shown in FIG. The R, G, and B values of each pixel of the frame image are added to the image in which the global distance is calculated every time, and this data is shown with reference to an oblique viewpoint rather than from the front. is there. As shown in FIG. 16B, three-dimensional distance information is extracted for each region.

　また、カメラの移動に伴ってフレーム画像の横軸（ｘ軸）方向の長さが長くなってしまうおそれがある。このため、フレーム画像においても、時間が一定時間経過した位置のフレーム画像を新たなフレーム画像として用いて、新たなフレーム画像に基づいてスライス画像を求めて画素毎の距離を算出する方法を用いることができる。このように、複数のフレーム画像に基づいてそれぞれのスライス画像を再生し、画素毎の距離を算出することによって、より広い撮影範囲において、カメラ２００から撮影対象物までの距離を算出することが可能になる。このように複数のフレーム画像に基づいて画素毎の距離が算出される場合には、それぞれのフレーム画像において距離が算出された画素の範囲を考慮しつつ、モザイキングする必要がある。 Also, there is a risk that the length of the frame image in the horizontal axis (x-axis) direction becomes longer as the camera moves. For this reason, a method of calculating a distance for each pixel by obtaining a slice image based on a new frame image using a frame image at a position where a certain amount of time has elapsed as a new frame image is also used for the frame image. Can do. In this manner, by reproducing each slice image based on a plurality of frame images and calculating the distance for each pixel, it is possible to calculate the distance from the camera 200 to the object to be imaged in a wider imaging range. become. When the distance for each pixel is calculated based on a plurality of frame images as described above, it is necessary to perform mosaicing while considering the range of pixels for which the distance is calculated in each frame image.

　しかし、モザイキングしたい画像の各画素は、ＲＧＢ情報（Ｒの値と、Ｇの値と、Ｂの値）と距離情報（距離値）との計４つの要素値を持つので、通常のモザイキングの方法であるスティッチング処理のアルゴリズムを使うことができない。そこで、以下では新規の方法を提案する。 However, since each pixel of the image to be mosaicized has four element values including RGB information (R value, G value, and B value) and distance information (distance value), a normal mosaicing method is used. The stitching algorithm that is cannot be used. Therefore, a new method is proposed below.

　ここで、共通する画像部分が存在する異なる時間に撮影されたフレーム画像を、オーバーラッピング処理を用いて貼り合わせる場合を考える。共通する画像部分に対してオーバーラッピング処理を施すことによって、２枚の画像から１枚の画像を生成する方法として、スティッチングアルゴリズムが一般的に知られている。スティッチングアルゴリズムは、広く知られた画像の貼り合わせ手法であって、Open CV（Open Source Computer Vision Library）と呼ばれる、広く公開されたオープンソースのコンピュータビジョン向けライブラリによって提供されている。このため、誰でもスティッチングアルゴリズムを利用することができる。スティッチングアルゴリズムでは、貼り合わせ対象となる画像の色情報（以下、ＲＧＢ情報と称する）を利用して、貼り合わせ処理を行う。 Here, consider a case where frame images taken at different times where a common image portion exists are pasted together using an overlapping process. A stitching algorithm is generally known as a method for generating one image from two images by performing an overlapping process on a common image portion. The stitching algorithm is a well-known image stitching method, and is provided by an open source computer vision library called Open CV (Open Source Computer Vision Library). For this reason, anyone can use the stitching algorithm. In the stitching algorithm, the color information (hereinafter referred to as RGB information) of the image to be combined is used to perform the combining process.

　既に説明したように、スライス画像とのマッチング処理が行われたフレーム画像では、対応する画素に距離情報が付加されている。このため、フレーム画像は、全ての画素にＲＧＢ情報が付加されるだけでなく、マッチング対象となった画素に距離情報が付加されるという特徴がある。 As already described, in the frame image that has been subjected to the matching processing with the slice image, distance information is added to the corresponding pixel. For this reason, the frame image is characterized in that not only RGB information is added to all the pixels, but also distance information is added to the pixels to be matched.

　しかしながら、上述したスティッチングアルゴリズムでは、ＲＧＢ情報のみに基づいて画像の貼り合わせ処理を行う。このため、２枚のフレーム画像をスティッチングアルゴリズムを用いて単純に貼り合わせると、距離情報が全く考慮されない状態で画像の貼り合わせ処理が行われてしまう。従って、貼り合わされたフレーム画像には、貼り合わされる前のフレーム画像の距離情報が十分に反映（あるいは維持）されていると判断することができなかった。 However, in the stitching algorithm described above, image combining processing is performed based only on RGB information. For this reason, when two frame images are simply pasted together using a stitching algorithm, the image pasting process is performed in a state where distance information is not considered at all. Therefore, it has not been possible to determine that the distance information of the frame image before being pasted is sufficiently reflected (or maintained) in the pasted frame image.

　このため、ＲＧＢ情報と距離情報とが記録された２枚のフレーム画像に対して、スティッチングアルゴリズムを適用することにより、ＲＧＢ情報だけでなく距離情報の対応関係が十分に反映（あるいは維持）された１枚のパノラマ画像を生成する、貼り合わせ処理について説明する。 For this reason, by applying the stitching algorithm to the two frame images in which the RGB information and the distance information are recorded, not only the RGB information but also the correspondence relationship of the distance information is sufficiently reflected (or maintained). A description will be given of the pasting process for generating a single panoramic image.

　なお、フレーム画像の貼り合わせ処理を行う場合には、２つの場合が考えられる。１つ目は、ＲＧＢ情報と分割された領域の距離情報とがそれぞれの画素に付加されたフレーム画像を貼り合わせる場合である。例えば、画像距離算出装置１００において、スライス画像の領域とフレーム画像の領域との対応関係を求めて、領域毎にグローバル距離を算出した直後のフレーム画像を貼り合わせる場合等が該当する。この場合、領域内の画素毎には、ローカル距離が算出されていない。このため、各画素の距離情報は、同じ領域毎に同じ距離値を示すと判断することができる。 Note that there are two cases in which frame image pasting processing is performed. The first is a case where a frame image in which RGB information and distance information of divided areas are added to respective pixels is pasted. For example, the image distance calculation apparatus 100 corresponds to a case where the correspondence between the slice image area and the frame image area is obtained and the frame image immediately after the global distance is calculated for each area is pasted. In this case, the local distance is not calculated for each pixel in the region. For this reason, it can be determined that the distance information of each pixel indicates the same distance value for each same region.

　２つめは、ＲＧＢ情報と詳細な距離情報とが全ての画素に付加されたフレーム画像を貼り合わせる場合である。例えば、グローバル距離だけでなく、領域内のローカル距離を画素毎に算出し、グローバル距離にローカル距離を加算することによって、画素毎に詳細な距離値が算出されたフレーム画像を貼り合わせる場合等が該当する。この場合、フレーム画像の全ての画素には、当該画素に写っている撮影対象物からカメラ２００までの詳細な距離（グローバル距離＋ローカル距離）が付加されている。 The second is a case where a frame image in which RGB information and detailed distance information are added to all pixels is pasted. For example, when not only the global distance but also the local distance in the region is calculated for each pixel, and the local distance is added to the global distance, a frame image in which a detailed distance value is calculated for each pixel is pasted. Applicable. In this case, a detailed distance (global distance + local distance) from the subject to be photographed in the pixel to the camera 200 is added to all the pixels of the frame image.

　上述した２つの場合に分けて、距離情報を考慮した貼り合わせ処理について説明する。 The pasting process in consideration of distance information will be described in the above two cases.

　（１）ＲＧＢ情報と分割された領域の距離情報とがそれぞれの画素に付加されたフレーム画像を貼り合わせる場合
　図１７は、ＲＧＢ情報と分割された領域の距離情報とがそれぞれの画素に付加されたフレーム画像を貼り合わせる処理（第１の貼り合わせ処理）を示したフローチャートである。画像距離算出装置１００のＣＰＵ１０４は、貼り合わせ処理が行われる２枚のフレーム画像の全ての画素のＲＧＢ情報を読み取る（図１７のＳ．１１）。そして、ＣＰＵ１０４は、読み取られたＲＧＢ情報を、Ｒ軸、Ｇ軸、Ｂ軸からなるＲＧＢ空間に割り当てる処理を行う（図１７のＳ．１２）。 (1) When a frame image in which RGB information and distance information of a divided area are added to each pixel is pasted together FIG. 17 shows that RGB information and distance information of a divided area are added to each pixel. 5 is a flowchart showing a process for combining frame images (first combining process). The CPU 104 of the image distance calculation apparatus 100 reads the RGB information of all the pixels of the two frame images on which the pasting process is performed (S.11 in FIG. 17). Then, the CPU 104 performs processing for assigning the read RGB information to an RGB space including the R axis, the G axis, and the B axis (S.12 in FIG. 17).

　図１８は、２枚のフレーム画像の全ての画素のＲＧＢ情報を、Ｒ軸、Ｇ軸、Ｂ軸からなるＲＧＢ空間に割り当てた状態を示した図である。図１８に示すように、フレーム画像の全ての画素のＲＧＢ情報を、ＲＧＢ空間に割り当てても、全く使用されていないＲＧＢ空間の座標が存在する。例えば、ＲＧＢ空間の外側周辺の空間位置のＲＧＢ情報は、２枚のフレーム画像において全く使用されていない。フレーム画像において使用されていないＲ値、Ｂ値、Ｇ値を示すＲＧＢ空間の点をコードと称する。 FIG. 18 is a diagram showing a state in which RGB information of all pixels of two frame images is assigned to an RGB space composed of an R axis, a G axis, and a B axis. As shown in FIG. 18, even if the RGB information of all the pixels of the frame image is assigned to the RGB space, there are coordinates in the RGB space that are not used at all. For example, RGB information of spatial positions outside the RGB space is not used at all in the two frame images. A point in the RGB space indicating the R value, B value, and G value that are not used in the frame image is referred to as a code.

　フレーム画像の画素は、既に説明したように、領域内の画素毎に、同じ距離情報（距離値）を有していると考えられる。このため、ＣＰＵ１０４は、同じ領域毎にいくつか（例えば、３個から５個程度）の画素を選択し（図１７のＳ．１３）、選択された画素の距離情報（選択された画素が存在する領域のグローバル距離）を抽出する（図１７のＳ．１４、画素距離値抽出ステップ、画素距離値抽出機能）。 It is considered that the pixels of the frame image have the same distance information (distance value) for each pixel in the region, as already described. For this reason, the CPU 104 selects several (for example, about 3 to 5) pixels for the same region (S.13 in FIG. 17), and distance information of the selected pixel (the selected pixel exists). (Global distance of region to be performed) is extracted (S.14 in FIG. 17, pixel distance value extraction step, pixel distance value extraction function).

　次に、ＣＰＵ１０４は、コードに該当する複数のＲＧＢ情報（Ｒ値、Ｂ値、Ｇ値：ＲＧＢの値）を抽出する（図１７のＳ．１５、コード検出ステップ、コード検出機能）。そして、ＣＰＵ１０４は、フレーム画像の領域毎に抽出された距離情報（距離値）の値（図１７のＳ．１４）に対して、抽出されたコードのＲＧＢ情報の値（コードのＲＧＢの値）を、重複しないように割り当てる（図１７のＳ．１６、コードＲＧＢ値割当ステップ、コードＲＧＢ値割当機能）。 Next, the CPU 104 extracts a plurality of RGB information (R value, B value, G value: RGB value) corresponding to the code (S.15 in FIG. 17, code detection step, code detection function). Then, the CPU 104 extracts the RGB information value (code RGB value) of the extracted code with respect to the distance information (distance value) value (S.14 in FIG. 17) extracted for each area of the frame image. Are assigned so as not to overlap (S.16 in FIG. 17, code RGB value assignment step, code RGB value assignment function).

　そして、ＣＰＵ１０４は、コードのＲＧＢの値が割り当てられた距離値と同じ距離値を備える画素を２枚のフレーム画像の画素から求めて、求められた画素のＲＧＢの値を、当該距離値に応じて割り当てられたコードのＲＧＢの値に入れ替える（図１７のＳ．１７、ＲＧＢ値入替ステップ、ＲＧＢ値入替機能）。 Then, the CPU 104 obtains a pixel having the same distance value as the distance value to which the RGB value of the code is assigned from the pixels of the two frame images, and determines the RGB value of the obtained pixel according to the distance value. Are replaced with the RGB values of the assigned code (S.17 in FIG. 17, RGB value replacement step, RGB value replacement function).

　ＣＰＵ１０４は、入れ替えられた後のＲＧＢの値を、当該ＲＧＢの値への入れ替えが行われた画素の距離値に紐付けて、ＲＡＭ１０３あるいは記録部１０１に記録させる（図１７のＳ．１８、画素情報記録ステップ、画素情報記録機能）。 The CPU 104 associates the RGB value after the replacement with the distance value of the pixel subjected to the replacement with the RGB value, and causes the RAM 103 or the recording unit 101 to record the value (S.18 in FIG. Information recording step, pixel information recording function).

　図１９は、一部の画素のＲＧＢの値が、コードのＲＧＢの値に入れ替えられた一のフレーム画像を示している。また、図２０は、一部の画素のＲＧＢの値が、コードのＲＧＢの値に入れ替えられた他のフレーム画像を示している。入れ替えられた後のＲＧＢの値は、元のフレーム画像において全く用いられていないＲＧＢの値であるため、同じ領域に存在する他の画素の色（ＲＧＢの値）と、明らかに異なった色になる。 FIG. 19 shows one frame image in which the RGB values of some pixels are replaced with the RGB values of the code. FIG. 20 shows another frame image in which the RGB values of some pixels are replaced with the RGB values of the code. Since the RGB values after the replacement are RGB values that are not used at all in the original frame image, they are clearly different from the colors of other pixels (RGB values) existing in the same region. Become.

　図１９および図２０に示すように、合成しようとする２枚のフレーム画像のそれぞれに対して、同じ領域のいくつかの画素のＲＧＢ情報を、コードのＲＧＢの値に入れ替える。このように、ＲＧＢの値を入れ替えることによって、コードのＲＧＢの値に対して距離情報（距離値）が紐付けされたＲＧＢ画像（フレーム画像）が作成される。 As shown in FIGS. 19 and 20, for each of the two frame images to be synthesized, the RGB information of some pixels in the same region is replaced with the RGB value of the code. In this way, by replacing the RGB values, an RGB image (frame image) in which distance information (distance value) is associated with the RGB values of the code is created.

　そして、ＣＰＵ１０４は、作成された２枚のＲＧＢ画像（フレーム画像）を用いて、スティッチングアルゴリズムを適用することにより、２枚のＲＧＢ画像の貼り合わせ処理を行う（図１７のＳ．１９、貼り合わせ画像生成ステップ、貼り合わせ画像生成機能）。スティッチングアルゴリズムにより貼り合わされた画像を、説明の便宜上、貼り合わせ画像と称する。 Then, the CPU 104 applies a stitching algorithm using the two created RGB images (frame images) to perform a process of combining the two RGB images (S.19 in FIG. Combined image generation step, combined image generation function). For convenience of explanation, an image that is pasted by the stitching algorithm is referred to as a stitched image.

　貼り合わせ処理によって、２枚のＲＧＢ画像から、１枚の貼り合わせ画像が生成される。貼り合わせ画像には、コードのＲＧＢの値に対して紐付けられた距離情報を備える画素が存在する。ここで、紐付けされた画素のＲＧＢの値は、貼り合わせ処理によって少しだけ値が変化する傾向がある。しかしながら、コードのＲＧＢの値は、フレーム画像に用いられていないＲＧＢ空間の値であり、さらに、距離値毎に重複しないようにして割り当てられている。このため、貼り合わせ処理によってＲＧＢの値が少し変化していても、貼り合わせ画像のＲＧＢの値から、該当する画素を推測して抽出することは容易である。ＣＰＵ１０４は、距離情報の割り当てが行われた複数のコード（コード群）のＲＧＢの値の中から、貼り合わせ画像に存在するＲＧＢの値（色の値）と一致あるいは近似するＲＧＢの値の画素を検出する（図１７のＳ．２０、ＲＧＢ値検出ステップ、ＲＧＢ値検出機能）。 The pasting process generates one pasted image from the two RGB images. In the combined image, there are pixels having distance information associated with the RGB values of the code. Here, the RGB values of the associated pixels tend to change slightly depending on the pasting process. However, the RGB values of the code are RGB space values that are not used in the frame image, and are assigned so as not to overlap each distance value. For this reason, even if the RGB values are slightly changed by the combining process, it is easy to estimate and extract the corresponding pixel from the RGB values of the combined image. The CPU 104 selects an RGB value pixel that matches or approximates an RGB value (color value) existing in the composite image from among RGB values of a plurality of codes (code group) to which distance information is assigned. (S.20 in FIG. 17, RGB value detection step, RGB value detection function).

　そして、ＣＰＵ１０４は、検出された画素に対し、ＲＡＭ１０３あるいは記録部１０１に記録されるＲＧＢの値に紐付けられた距離値を、当該画素の距離情報として付加する（図１７のＳ．２１、距離情報付加ステップ、距離情報付加機能）。 Then, the CPU 104 adds the distance value associated with the RGB value recorded in the RAM 103 or the recording unit 101 to the detected pixel as the distance information of the pixel (S.21 in FIG. Information addition step, distance information addition function).

　このようにして、距離情報が付加される画素のＲＧＢの値を、フレーム画像において全く使用されていないＲＧＢの値に入れ替えた上で、スティッチングアルゴリズムによる貼り合わせ処理を行う。この処理によって、ＲＧＢ情報（ＲＧＢの値）だけでなく、距離情報（距離値）を十分に反映（あるいは維持）させた状態で、２枚のフレーム画像の貼り合わせ処理を行うことが可能になる。 In this way, the RGB value of the pixel to which the distance information is added is replaced with an RGB value that is not used at all in the frame image, and then the stitching process is performed by the stitching algorithm. With this process, it is possible to perform the process of combining two frame images in a state where not only RGB information (RGB values) but also distance information (distance values) are sufficiently reflected (or maintained). .

　なお、貼り合わせ画像において距離情報が付加された画素の色情報（ＲＧＢ情報）は、スティッチングアルゴリズムを適用する前のフレーム画像で使用されていなかったＲＧＢの色情報である。従って、周囲の画素と明らかに異なった色（ＲＧＢの値）で表示されることになる。このため、ＣＰＵ１０４は、距離情報が付加された画素のＲＧＢの値を、該当する画素の近傍の画素（例えば、周囲の４個の画素、あるいは８個の画像）のＲＧＢの値の平均値に置き換える処理を行う（図１７のＳ．２２、ＲＧＢ値変更ステップ、ＲＧＢ値変更機能）。このように、距離情報が付加された画素のＲＧＢの値を近傍の画素のＲＧＢの値の平均に置き換えることによって、該当する画素の色情報（ＲＧＢの値）と周囲の色との間で違和感が生じなくなる。 Note that the color information (RGB information) of the pixels to which the distance information is added in the combined image is RGB color information that has not been used in the frame image before applying the stitching algorithm. Therefore, it is displayed in a color (RGB value) that is clearly different from the surrounding pixels. For this reason, the CPU 104 sets the RGB value of the pixel to which the distance information is added to the average value of the RGB values of the pixels in the vicinity of the corresponding pixel (for example, surrounding four pixels or eight images). Replacement processing is performed (S.22 in FIG. 17, RGB value changing step, RGB value changing function). In this way, by replacing the RGB value of the pixel to which the distance information is added with the average of the RGB values of neighboring pixels, there is a sense of incongruity between the color information (RGB value) of the corresponding pixel and the surrounding color. Will not occur.

　貼り合わせ画像において、コードのＲＧＢの値が割り当てられた画素のＲＧＢの値を、近傍の画素のＲＧＢ値の平均値に置き換えた後に、改めてmean-shift法を、貼り合わせ画像に対して適用する。mean-shift法の適用によって、ＲＧＢ情報に基づいて、フレーム画像の分割領域を求めることが可能になる。図２１は、貼り合わせ画像に対してmean-shift法を適用することによって領域分割が行われたスライス画像を、一例として示した図である。さらに、その領域内に存在する画素のうち、距離情報が付加されている画素を用いて距離の平均値を求めることによって、領域毎の距離（グローバル距離）を求めることが可能になる。 In the composite image, after replacing the RGB value of the pixel to which the RGB value of the code is assigned with the average value of the RGB values of neighboring pixels, the mean-shift method is applied again to the composite image. . By applying the mean-shift method, it becomes possible to obtain a divided region of the frame image based on the RGB information. FIG. 21 is a diagram illustrating, as an example, a slice image in which region division has been performed by applying the mean-shift method to the combined image. Furthermore, it is possible to obtain a distance for each region (global distance) by obtaining an average value of distances using pixels to which distance information is added among pixels existing in the region.

　（２）ＲＧＢ情報と詳細な距離情報とが全ての画素に付加されたフレーム画像を貼り合わせる場合
　図２２は、ＲＧＢ情報と距離情報とがそれぞれの画素に付加されたフレーム画像を貼り合わせる処理（第２の貼り合わせ処理）の内容を示したフローチャートである。まず、ＣＰＵ１０４は、上述した「（１）ＲＧＢ情報と分割された領域の距離情報とがそれぞれの画素に付加されたフレーム画像を貼り合わせる場合」と同様に、貼り合わせ処理が行われる２枚のフレーム画像の全ての画素のＲＧＢ情報を読み取る（図２２のＳ．３１）。そして、ＣＰＵ１０４は、読み取られたＲＧＢ情報を、Ｒ軸、Ｇ軸、Ｂ軸からなるＲＧＢ空間に割り当てる処理を行う（図２２のＳ．３２）。ＲＧＢ空間にＲＧＢ情報の割り当てを行っても、ＲＧＢ空間において全く使用されていないＲＧＢ空間の座標が存在する。フレーム画像において使用されていないＲ値、Ｂ値、Ｇ値を示すＲＧＢ空間の点を、既に説明したように、コードと称する。 (2) When combining frame images in which RGB information and detailed distance information are added to all pixels FIG. 22 is a process of combining frame images in which RGB information and distance information are added to each pixel ( It is the flowchart which showed the content of the 2nd bonding process. First, the CPU 104 performs the same process as in the case of “(1) combining frame images in which RGB information and distance information of divided areas are added to respective pixels” described above. The RGB information of all the pixels of the frame image is read (S.31 in FIG. 22). Then, the CPU 104 performs processing for assigning the read RGB information to the RGB space including the R axis, the G axis, and the B axis (S.32 in FIG. 22). Even if the RGB information is assigned to the RGB space, there are coordinates in the RGB space that are not used at all in the RGB space. The points in the RGB space indicating the R value, B value, and G value that are not used in the frame image are referred to as codes as described above.

　ここで、貼り合わせを行うフレーム画像には、全ての画素にＲＧＢ情報（ＲＧＢの値）と距離情報（距離値）とが付加されている。この距離情報は、領域の距離を示すものではない。このため、上述した（１）の方法のように、同じ距離情報の画素をいくつか選択する手法を用いることができない。 Here, RGB information (RGB values) and distance information (distance values) are added to all pixels in the frame image to be bonded. This distance information does not indicate the distance of the region. For this reason, a method of selecting several pixels having the same distance information as in the method (1) described above cannot be used.

　このため、ＣＰＵ１０４は、貼り合わせ対象となる２枚のフレーム画像の画素のうち、一定割合の画素、例えば、全体の５％（Ｎ＝２０とした場合に、１／Ｎ＝５％。但し、Ｎは正数）の画素をランダムに選択し（図２２のＳ．３３）、選択された画素の距離情報（距離値）を抽出する（図２２のＳ．３４、画素距離値抽出ステップ、画素距離値抽出機能）。 For this reason, the CPU 104 determines a certain percentage of the pixels of the two frame images to be bonded, for example, 5% of the total (when N = 20, 1 / N = 5%, provided that N is a positive pixel) (S.33 in FIG. 22), and distance information (distance value) of the selected pixel is extracted (S.34 in FIG. 22, pixel distance value extraction step, pixel Distance value extraction function).

　次に、ＣＰＵ１０４は、コードに該当する複数のＲＧＢ情報（Ｒ値、Ｂ値、Ｇ値：ＲＧＢの値）を抽出する（図２２のＳ．３５、コード検出ステップ、コード検出機能）。そして、ＣＰＵ１０４は、抽出された画素毎の距離情報（距離値）の値に対して、抽出されたコードのＲＧＢ情報の値（コードのＲＧＢの値）を、重複しないように割り当てる（図２２のＳ．３６、コードＲＧＢ値割当ステップ、コードＲＧＢ値割当機能）。 Next, the CPU 104 extracts a plurality of RGB information (R value, B value, G value: RGB value) corresponding to the code (S.35 in FIG. 22, code detection step, code detection function). Then, the CPU 104 assigns the extracted RGB information value (code RGB value) of the code to the extracted distance information (distance value) value for each pixel so as not to overlap (FIG. 22). S.36, code RGB value assignment step, code RGB value assignment function).

　そして、ＣＰＵ１０４は、コードのＲＧＢの値が割り当てられた距離値と同じ距離値を備える画素を２枚のフレーム画像の画素から求めて、求められた画素のＲＧＢの値を、当該距離値に応じて割り当てられたコードのＲＧＢの値に入れ替える（図２２のＳ．３７、ＲＧＢ値入替ステップ、ＲＧＢ値入替機能）。このように、ＲＧＢの値を入れ替えることによって、コードのＲＧＢの値に対して距離情報が紐付けされたＲＧＢ画像（フレーム画像）が作成される。 Then, the CPU 104 obtains a pixel having the same distance value as the distance value to which the RGB value of the code is assigned from the pixels of the two frame images, and determines the RGB value of the obtained pixel according to the distance value. Are replaced with the RGB values of the assigned code (S.37 in FIG. 22, RGB value replacement step, RGB value replacement function). In this way, by replacing the RGB values, an RGB image (frame image) in which distance information is associated with the RGB values of the code is created.

　ＣＰＵ１０４は、入れ替えられた後のＲＧＢの値を、当該ＲＧＢの値への入れ替えが行われた画素の距離値に紐付けて、ＲＡＭ１０３あるいは記録部１０１に記録させる（図２２のＳ．３８、画素情報記録ステップ、画素情報記録機能）。 The CPU 104 associates the RGB value after the replacement with the distance value of the pixel subjected to the replacement with the RGB value, and causes the RAM 103 or the recording unit 101 to record the value (S.38, pixel in FIG. 22). Information recording step, pixel information recording function).

　そして、ＣＰＵ１０４は、５％の画素の色情報（ＲＧＢの値）が入れ替えられた２枚のＲＧＢ画像（フレーム画像）に対して、スティッチングアルゴリズムを適用することにより、２枚のＲＧＢ画像の貼り合わせ処理を行う（図２２のＳ．３９、貼り合わせ画像生成ステップ、貼り合わせ画像生成機能）。スティッチングアルゴリズムにより貼り合わされた画像を、既に説明したように、貼り合わせ画像と称する。 Then, the CPU 104 applies the stitching algorithm to the two RGB images (frame images) in which the color information (RGB values) of 5% of the pixels is replaced, thereby pasting the two RGB images. A matching process is performed (S.39 in FIG. 22, a combined image generation step, a combined image generation function). As already described, an image that has been combined by the stitching algorithm is referred to as a combined image.

　貼り合わせ処理によって、２枚のＲＧＢ画像から、１枚の貼り合わせ画像が生成される。貼り合わせ画像には、コードのＲＧＢの値に紐付けされた距離情報を備える画素が、全体の画素数の５％だけ存在する。紐付けされた画素のＲＧＢの値は、上述したように、貼り合わせ処理によって少しだけ値が変化する傾向がある。ＣＰＵ１０４は、距離情報の割り当てが行われた複数のコード（コード群）のＲＧＢの値の中から、貼り合わせ画像に存在するＲＧＢの値（色の値）と一致あるいは近似するＲＧＢの値の画素を検出する（図２２のＳ．４０、ＲＧＢ値検出ステップ、ＲＧＢ値検出機能）。 The pasting process generates one pasted image from the two RGB images. In the stitched image, there are 5% of the total number of pixels having distance information associated with the RGB values of the code. As described above, the RGB values of the associated pixels tend to change slightly depending on the pasting process. The CPU 104 selects an RGB value pixel that matches or approximates an RGB value (color value) existing in the composite image from among RGB values of a plurality of codes (code group) to which distance information is assigned. (S.40 in FIG. 22, RGB value detection step, RGB value detection function).

　そして、ＣＰＵ１０４は、検出された画素に対し、ＲＡＭ１０３あるいは記録部１０１に記録されるＲＧＢの値に紐付けられた距離値を、当該画素の距離情報として付加する（図２２のＳ．４１、距離情報付加ステップ、距離情報付加機能）。 Then, the CPU 104 adds the distance value associated with the RGB value recorded in the RAM 103 or the recording unit 101 to the detected pixel as the distance information of the pixel (S.41, distance in FIG. 22). Information addition step, distance information addition function).

　また、貼り合わせ画像において距離情報が付加された画素の色情報（ＲＧＢ情報）は、周囲の画素と明らかに異なった色（ＲＧＢの値）で表示されている。このため、ＣＰＵ１０４は、距離情報が付加された画素のＲＧＢの値を、該当する画素の近傍の画素（例えば、周囲の４個の画素、あるいは８個の画像）のＲＧＢの値の平均値に置き換える処理を行う（図２２のＳ．４２、修正貼り合わせ画像生成ステップ、修正貼り合わせ画像生成機能）。このように、距離情報が付加された画素のＲＧＢの値を近傍の画素のＲＧＢの値の平均に置き換えることによって、該当する画素の色情報（ＲＧＢの値）と周囲の色との間で違和感が生じなくなる。このように、近傍の画素のＲＧＢの値の平均によって、ＲＧＢの値の修正が行われた貼り合わせ画像を、修正貼り合わせ画像と称する。 Also, the color information (RGB information) of the pixel to which distance information is added in the combined image is displayed in a color (RGB value) that is clearly different from the surrounding pixels. For this reason, the CPU 104 sets the RGB value of the pixel to which the distance information is added to the average value of the RGB values of the pixels in the vicinity of the corresponding pixel (for example, surrounding four pixels or eight images). A replacement process is performed (S.42 in FIG. 22, modified composite image generation step, corrected composite image generation function). In this way, by replacing the RGB value of the pixel to which the distance information is added with the average of the RGB values of neighboring pixels, there is a sense of incongruity between the color information (RGB value) of the corresponding pixel and the surrounding color. Will not occur. Thus, a composite image in which the RGB values are corrected by the average of the RGB values of neighboring pixels is referred to as a corrected composite image.

　このようにして、ランダムに選択された一部（全体の画素数の５％）の画素のＲＧＢの値を、フレーム画像において全く使用されていないＲＧＢの値に入れ替えた上で、スティッチングアルゴリズムによる貼り合わせ処理を行う。この処理によって、ＲＧＢ情報だけでなく、距離情報を反映（あるいは維持）させた状態で、２枚のフレーム画像の貼り合わせ処理を行うことが可能になる。 In this way, after replacing the RGB values of some randomly selected pixels (5% of the total number of pixels) with RGB values that are not used at all in the frame image, the stitching algorithm is used. A pasting process is performed. By this process, it is possible to perform a process of combining two frame images in a state where not only RGB information but also distance information is reflected (or maintained).

　但し、修正貼り合わせ画像の全体の画素数の５％の画素に関しては、ＲＧＢ情報と距離情報とが付加された状態となるが、残りの９５％の画素に関しては、ＲＧＢ情報だけが付加された画素であるため、全ての画素に対して距離情報が十分に反映（あるいは維持）されるものではない。 However, although the RGB information and the distance information are added to the pixels of 5% of the total number of pixels of the corrected composite image, only the RGB information is added to the remaining 95% of the pixels. Since it is a pixel, distance information is not sufficiently reflected (or maintained) for all pixels.

　ＣＰＵ１０４は、ＲＧＢの値の置き換え処理（図２２のＳ．４２）を行った後に、修正貼り合わせ画像を、ＲＡＭ１０３あるいは記録部１０１に記録させる（図２２のＳ．４３）。そして、ＣＰＵ１０４は、全体の画素数の５％の画素をランダムに選択する処理（Ｓ．３３）によって、２枚のフレーム画像の全ての画素が選択されたか否かを判断する（図２２のＳ．４４）。全ての画素が選択されていない場合（図２２のＳ．４４においてＮｏの場合）、ＣＰＵ１０４は、全ての画素の中から未だ選択されていない画素を、Ｓ．３３における画素の選択対象に設定して（図２２のＳ．４５）、処理をＳ．３３へ移行させる。このように、全ての画素が選択されていない場合には、未だ選択されていない画素の中から、フレーム画像の全体の画素数の５％の画素をランダムに選択し（図２２のＳ．３３）、上述した修正貼り合わせ画像の生成処理（図２２のＳ．３４～Ｓ．４４）を繰り返し行う。 The CPU 104 performs the RGB value replacement process (S.42 in FIG. 22), and then records the corrected composite image in the RAM 103 or the recording unit 101 (S.43 in FIG. 22). Then, the CPU 104 determines whether or not all the pixels of the two frame images have been selected by the process of randomly selecting 5% of the total number of pixels (S.33) (S in FIG. 22). .44). When all the pixels have not been selected (No in S.44 in FIG. 22), the CPU 104 selects pixels that have not been selected from all the pixels as S.P. 33 is set as a pixel selection target in FIG. 33. As described above, when all the pixels are not selected, pixels that are 5% of the total number of pixels of the frame image are randomly selected from the pixels that have not been selected (S.33 in FIG. 22). ), The above-described modified composite image generation processing (S.34 to S.44 in FIG. 22) is repeated.

　全ての画素が選択された場合（図２２のＳ．４４においてＹｅｓの場合）、ＣＰＵ１０４は、ＲＡＭ１０３あるいは記録部１０１に記録された全ての修正貼り合わせ画像（全体の画素数の５％だけを選択した場合には、２０枚の修正貼り合わせ画像）を読み出す（図２２のＳ．４６）。読み出された２０枚の修正貼り合わせ画像のそれぞれには、他の修正貼り合わせ画像の画素と重複しないようにして、距離情報が付加されている。また、それぞれの修正貼り合わせ画像には、１枚の修正貼り合わせ画像の全ての画素数の５％の画素に、距離情報が付加されている。このため、ＣＰＵ１０４は、２０枚の修正貼り合わせ画像を重ね合わせることによって、重複することなく全ての画素の距離情報を求める（図２２のＳ．４７）。そして、ＣＰＵ１０４は、全ての画素の距離情報を、１枚の修正貼り合わせ画像に付加させることによって、全ての画素にＲＧＢ情報と距離情報とが付加された修正貼り合わせ画像を生成する（図２２のＳ．４８、距離付加貼り合わせ画像生成ステップ、距離付加貼り合わせ画像生成機能）。 When all the pixels have been selected (Yes in S.44 of FIG. 22), the CPU 104 selects all the corrected composite images recorded in the RAM 103 or the recording unit 101 (selects only 5% of the total number of pixels). In this case, 20 corrected composite images) are read out (S.46 in FIG. 22). Distance information is added to each of the 20 corrected composite images read out so as not to overlap with the pixels of the other corrected composite images. Further, distance information is added to 5% of all the number of pixels of one corrected composite image in each corrected composite image. For this reason, the CPU 104 obtains distance information of all the pixels without overlapping by superimposing the 20 corrected composite images (S. 47 in FIG. 22). Then, the CPU 104 adds the distance information of all the pixels to one corrected composite image, thereby generating a corrected composite image in which the RGB information and the distance information are added to all the pixels (FIG. 22). S.48, distance addition composite image generation step, distance addition composite image generation function).

　以上、説明したように、コードのＲＧＢ情報を利用して、距離情報が付加されているＲＧＢの値を、コードのＲＧＢの値に置き換えて、スティッチングアルゴリズムを適用することによって、ＲＧＢ情報と距離情報と考慮したうえで、複数のフレーム画像を貼り合わせることができる。このため、カメラによって撮影された広範囲の動画映像に基づいて、１枚のパノラマ画像を生成することが可能となる。 As described above, by using the RGB information of the code and replacing the RGB value to which the distance information is added with the RGB value of the code and applying the stitching algorithm, the RGB information and the distance Multiple frame images can be pasted together in consideration of information. For this reason, it is possible to generate a single panoramic image based on a wide range of video images captured by the camera.

　例えば、移動しながら広範囲の風景等が撮影された動画映像に基づいて、撮影対象物までの距離を求める場合、動画映像の撮影時間に応じて、ＲＧＢ情報と距離情報とが記録されたフレーム画像を、複数枚抽出することが可能である。抽出された複数枚のフレーム画像では、時間的に前後するフレーム画像間で、共通する画像部分が含まれる。このため、共通する画像部分を基準として、フレーム画像を貼り合わせることにより、上述したように、撮影された広範囲の画像を１枚のパノラマ画像とすることができる。そして、このパノラマ画像を用いることにより、パノラマ画像に写っている撮影対象物までの距離を広範囲に求めることが可能になる。 For example, when obtaining the distance to the object to be photographed based on a moving image in which a wide range of scenery is photographed while moving, a frame image in which RGB information and distance information are recorded according to the photographing time of the moving image It is possible to extract a plurality of sheets. In the extracted plurality of frame images, a common image portion is included between frame images that are temporally changed. For this reason, by combining frame images with a common image portion as a reference, a wide range of captured images can be made into one panoramic image as described above. Then, by using this panoramic image, it is possible to obtain a wide range of distances to the object to be photographed shown in the panoramic image.

　以上説明したように、本実施の形態に係る画像距離算出装置１００のＣＰＵ１０４では、移動する１台のカメラにより撮影された動画映像に基づいて、撮影された動画映像の特定の時間におけるフレーム画像を求める。さらに、フレーム画像の横軸（ｘ軸）上のいずれかのｘ座標の位置を基準として、フレーム画像の縦軸（ｙ軸）と撮影された動画映像の時間軸（ｔ軸）とによってスライス画像を生成する。そして、スライス画像の時間ｔにおける画素と、フレーム画像の所定のｘ座標における縦軸（ｙ軸）上の画素列（ライン）の画素との対応関係を、ライン対画像のＤＰマッチング処理によって求めて、フレーム画像におけるスポッティング点を算出する。そして、ＣＰＵ１０４は、求められたスポッティング点からのバックトレース処理により、フレーム画像とスライス画像との画素毎の対応関係を明確にする。 As described above, the CPU 104 of the image distance calculation apparatus 100 according to the present embodiment generates a frame image at a specific time of a captured moving image based on the moving image captured by one moving camera. Ask. Further, the slice image is defined by the vertical axis (y axis) of the frame image and the time axis (t axis) of the captured moving image with reference to the position of any x coordinate on the horizontal axis (x axis) of the frame image. Is generated. Then, the correspondence between the pixel at the time t of the slice image and the pixel in the pixel column (line) on the vertical axis (y-axis) at a predetermined x coordinate of the frame image is obtained by DP matching processing of the line-to-image. The spotting point in the frame image is calculated. Then, the CPU 104 clarifies the correspondence between the frame image and the slice image for each pixel by back trace processing from the obtained spotting point.

　その後、ＣＰＵ１０４は、フレーム画像とスライス画像とのそれぞれに対して、mean-shift法を適用して領域分割を行った後に、フレーム画像とスライス画像との画素毎の対応関係に基づいて、スライス画像の分割領域とフレーム画像の分割領域との対応関係を求める。そして、ＣＰＵ１０４は、フレーム画像の対応領域においてグローバル距離とローカル距離とを求めて、グローバル距離とローカル距離とを足し合わせることによって、フレーム画像の画素毎に、カメラ２００から撮影対象物（各画素に写っている対象物）までの距離を算出することが可能になる。 After that, the CPU 104 applies the mean-shift method to each of the frame image and the slice image to perform region division, and then, based on the correspondence relationship for each pixel between the frame image and the slice image, the slice image The correspondence relationship between the divided areas of the frame image and the divided areas of the frame image is obtained. Then, the CPU 104 obtains the global distance and the local distance in the corresponding region of the frame image, and adds the global distance and the local distance to each other, so that the camera 200 captures the object to be captured (for each pixel) for each pixel of the frame image. It is possible to calculate the distance to the object.

　特に、本実施の形態に係る画像距離算出装置１００では、１台のカメラだけで撮影された動画映像に基づいて、動画映像のフレーム画像の画素毎に距離を算出することが可能である。このため、従来のステレオビジョン法のように、２台のカメラで同時に撮影対象物を撮影する必要がなく、さらに、２台のカメラの間隔を一定の距離に維持し続ける必要もない。従って、従来のステレオビジョン法によって、撮影対象物までの距離を算出する場合に比べて、撮影機材の簡素化と撮影負担の軽減を図ることが容易になる。 In particular, the image distance calculation apparatus 100 according to the present embodiment can calculate the distance for each pixel of the frame image of the moving image based on the moving image captured by only one camera. For this reason, unlike the conventional stereo vision method, it is not necessary to photograph an object to be photographed simultaneously with two cameras, and it is not necessary to keep the distance between the two cameras constant. Therefore, it becomes easy to simplify the photographing equipment and reduce the photographing burden as compared with the case where the distance to the photographing object is calculated by the conventional stereo vision method.

　また、１台のカメラで撮影された動画映像であって、撮影対象物に対していずれかの方向に移動している動画映像であれば、映像データに基づいてフレーム画像とスライス画像とを容易に生成することができる。 In addition, if it is a moving image captured by one camera and moving in either direction with respect to the object to be imaged, a frame image and a slice image can be easily obtained based on the image data. Can be generated.

　さらに、撮影対象物に対してカメラ２００がいずれかの方向に移動している動画映像であれば、移動に伴って生じる累積の動的視差が、圧縮された画像（画素）としてスライス画像に記録されることになる。このため、フレーム画像の画素毎の距離を算出するために、専用の撮影機材等を用いることなく、一般的なカメラにより撮影された動画映像に基づいて、簡単に画素毎の距離を求めることができる。 Further, in the case of a moving image in which the camera 200 is moving in any direction with respect to the shooting target, the accumulated dynamic parallax generated with the movement is recorded as a compressed image (pixel) in the slice image. Will be. For this reason, in order to calculate the distance for each pixel of the frame image, it is possible to easily obtain the distance for each pixel based on a moving image captured by a general camera without using a dedicated imaging device or the like. it can.

　また、一般的なカメラにより撮影された動画映像に基づいて、簡単に画素毎の距離を求めることができるため、例えば、過去に撮影された動画映像に基づいて画素毎の距離を算出することが可能である。従って、過去に撮影された膨大な映像データに基づいて、カメラから撮影対象物までの距離を簡単に算出すること可能となり、撮影当時の撮影環境を再現することが可能になる。 Moreover, since the distance for each pixel can be easily obtained based on a moving image captured by a general camera, for example, the distance for each pixel can be calculated based on a moving image captured in the past. Is possible. Therefore, it is possible to easily calculate the distance from the camera to the object to be photographed based on a large amount of video data photographed in the past, and to reproduce the photographing environment at the time of photographing.

　さらに、近年では、ゴーグルを用いて左右の目の視差を利用した映像をユーザに視聴させることによって、擬似的に３次元の世界を体感させるＶＲ（Virtual Reality）技術の研究・応用が盛んに行われている。このＶＲ技術を用いて体感される３次元の世界は、３次元に見えるだけで、実際に３次元の世界が実現されているわけではない。このＶＲ技術の応用として、カメラにより撮影された動画映像に基づいて映像に映し出される撮影対象物までの距離を、画像距離算出装置１００によって算出して３次元空間を形成し、人間が実際に空間を移動できるような室内、室外、市街地、山岳地帯などの広域の３次元的なデータ世界を構築することも可能である。このようなデータ世界を、カメラにより撮影された動画映像に基づいて構築することによって、ＶＲ技術の応用分野や利用分野を大きく変えることも可能になる。また、本実施の形態に係る画像距離算出装置１００を用いることによって、このような３次元空間の構築を簡易に行うことが可能である。 Further, in recent years, research and application of VR (Virtual Reality) technology that makes a user experience a three-dimensional world in a pseudo manner by allowing a user to view a video using the parallax of left and right eyes using goggles has been actively conducted. It has been broken. A three-dimensional world experienced using this VR technology only looks three-dimensional, and a three-dimensional world is not actually realized. As an application of this VR technology, the distance to the object to be imaged displayed on the video is calculated by the image distance calculation device 100 based on the moving image captured by the camera to form a three-dimensional space. It is also possible to build a three-dimensional data world in a wide area such as indoors, outdoors, urban areas, and mountainous areas that can be moved. By constructing such a data world based on a moving image shot by a camera, it is possible to greatly change the application field and application field of VR technology. In addition, by using the image distance calculation device 100 according to the present embodiment, it is possible to easily construct such a three-dimensional space.

　さらに、一般的なカメラにより撮影された動画映像に基づいて容易に、３次元空間の構築を行うことができるので、例えば、走行する車両により撮影された動画映像に基づいて、現実の街並みに基づく３次元空間のデータを構築したり、ドローンに取り付けられたカメラの動画映像に基づいて、空中から広範囲の状況を３次元空間のデータとして構築したりすることも可能である。 Furthermore, since a three-dimensional space can be easily constructed based on a moving image captured by a general camera, for example, based on a real cityscape based on a moving image captured by a traveling vehicle. It is also possible to construct data in a three-dimensional space, or to construct a wide range of situations from the air as data in a three-dimensional space based on a moving image of a camera attached to a drone.

　以上、本発明に係る画像距離算出装置および画像距離算出用プログラムを記録したコンピュータ読み取り可能な非一過性の記録媒体について図面を用いて詳細に説明したが、本発明に係る画像距離算出装置および画像距離算出用プログラムを記録したコンピュータ読み取り可能な非一過性の記録媒体は、実施の形態において示した例には限定されない。いわゆる当業者であれば、請求の範囲に記載された範疇内において、各種の変更例または修正例に想到し得ることは可能である。 As described above, the image distance calculation device according to the present invention and the computer-readable non-transitory recording medium in which the image distance calculation program is recorded have been described in detail with reference to the drawings. The computer-readable non-transitory recording medium in which the image distance calculation program is recorded is not limited to the example shown in the embodiment. It is possible for a person skilled in the art to come up with various changes or modifications within the scope of the claims.

　例えば、実施の形態に係る画像距離算出装置１００では、カメラ２００が横方向に移動される場合を一例として示して説明した。しかしながら、本発明に係る画像距離算出装置および画像距離算出用プログラムを記録したコンピュータ読み取り可能な非一過性の記録媒体では、移動するカメラにより撮影された動画映像に基づいて、フレーム画像とスライス画像とが生成され、フレーム画像に写っている撮影対象物が、カメラの移動に応じて、スライス画像に圧縮された状態で記録されるものであれば、撮影対象物までの距離を算出することが可能である。 For example, in the image distance calculation apparatus 100 according to the embodiment, the case where the camera 200 is moved in the horizontal direction is described as an example. However, in the computer-readable non-transitory recording medium in which the image distance calculation device and the image distance calculation program according to the present invention are recorded, the frame image and the slice image are based on the moving image captured by the moving camera. Can be calculated and the distance to the object to be photographed can be calculated if the object to be photographed in the frame image is recorded in a compressed state in accordance with the movement of the camera. Is possible.

　このため、カメラにより撮影される動画映像は、必ずしも横方向にカメラが移動する場合には限定されず、上下方向であっても斜め方向であってもよい。さらに、カメラのレンズを斜め方向に向けた状態でカメラを移動する場合（例えば、カメラのレンズが、カメラの進行方向に対して、左斜め前方、右斜め前方、左斜め後方、右斜め後方を向いた状態でカメラが移動する場合等）であっても、フレーム画像に写っている撮影対象物が、カメラの移動に応じて、スライス画像に圧縮された状態で記録されることになるため、カメラから撮影対象物までの距離を画素毎に算出することが可能になる。 For this reason, the moving image shot by the camera is not necessarily limited to when the camera moves in the horizontal direction, and may be in the vertical direction or in the oblique direction. In addition, when moving the camera with the camera lens facing diagonally (for example, the camera lens moves diagonally left forward, diagonally forward right, diagonally backward left, diagonally backward right with respect to the direction of camera movement). Even if the camera moves in a facing state, etc.), the subject to be photographed in the frame image is recorded in a compressed state into a slice image according to the movement of the camera. The distance from the camera to the object to be photographed can be calculated for each pixel.

　また、実施の形態に係る画像距離算出装置１００では、mean-shift法を用いて領域分割が行われた領域毎に、カメラ２００から撮影対象物までの距離を示すグローバル距離を算出する方法を説明した。具体的には、まず、スライス画像の一の領域の横軸方向の平均長をｐとし、フレーム画像の対応する領域の横軸方向の平均長をｑとして、スライス画像に対するフレーム画像の拡大率ｑ／ｐを求める。そして、現実の世界におけるカメラ２００から撮影対象物までの領域毎の距離とｑ／ｐの値との対応関係式は、理論的に計算される（図１３（ｂ）参照）。この対応関係式を用いて、領域毎に、カメラ２００から撮影対象物までの距離を求めた。対応関係式の作成で必要となる距離の範囲は、直接測るより、人間の直感によって決定する場合が多いといえる。 In addition, the image distance calculation apparatus 100 according to the embodiment describes a method of calculating a global distance indicating a distance from the camera 200 to the object to be photographed for each region where region division is performed using the mean-shift method. did. Specifically, first, let p be the average length in the horizontal axis direction of one area of the slice image, and q be the average length in the horizontal axis direction of the corresponding area of the frame image. Find / p. Then, the correspondence relation between the distance for each region from the camera 200 to the photographing object in the real world and the q / p value is theoretically calculated (see FIG. 13B). Using this correspondence formula, the distance from the camera 200 to the object to be photographed was obtained for each region. It can be said that the range of distance required for creating the correspondence relationship is often determined by human intuition rather than directly measuring.

　距離関数との例として、
　　距離Ｚ（ｐ，ｑ）＝１１９２．４・ｅｘｐ（－０．３６６（ｑ／ｐ））
　などがある。 As an example with distance function,
Distance Z (p, q) = 1192.4 · exp (−0.366 (q / p))
and so on.

　以下、距離関数を、人間による直感でなく、新たな理論根拠に基づいて、決定する方法について説明する。 Hereinafter, a method for determining the distance function based on a new theoretical basis rather than human intuition will be described.

　［グローバル距離を求めるための距離関数］
　図２３は、スライス画像の座標（１，ｙ′）から座標（Ｔ，ｙ′）までに至る横軸（ｔ軸方向）上の複数の画素（ピクセル）と、このスライス画像の複数の画素に対応するフレーム画像の横軸（ｘ軸方向）上の複数の画素（ピクセル）との関係を示した図である。図２３のフレーム画像に示される黒丸の画素をｘ（ｔ）で示す。黒丸の画素ｘ（ｔ）は、時間ｔ＝１からｔまで動的視差が累積されたものである。つまり、ｘ（ｔ）は、累積された動的視差に該当する。累積された動的視差を示す黒丸の画素ｘ（ｔ）は、スライス画像の時間軸ｔの時間数ｔ＝１，２，・・・，Ｔに対応して、Ｔ個存在する。ここで、スライス画像の横軸上の全ての画素数はＴ個であるが、フレーム画像の横軸上の全ての画素数はＴ個よりも多い。従って、フレーム画像においては、横軸上の全ての画素に対応する数だけ黒丸の画素ｘ（ｔ）が存在するわけではない。 [Distance function for finding global distance]
FIG. 23 shows a plurality of pixels (pixels) on the horizontal axis (t-axis direction) from the coordinate (1, y ′) to the coordinate (T, y ′) of the slice image, and a plurality of pixels of the slice image. It is the figure which showed the relationship with the some pixel (pixel) on the horizontal axis (x-axis direction) of a corresponding frame image. A black circle pixel shown in the frame image of FIG. 23 is indicated by x (t). A black circle pixel x (t) is obtained by accumulating dynamic parallax from time t = 1 to t. That is, x (t) corresponds to the accumulated dynamic parallax. There are T black pixels x (t) indicating the accumulated dynamic parallax corresponding to the number of times t = 1, 2,..., T on the time axis t of the slice image. Here, the total number of pixels on the horizontal axis of the slice image is T, but the total number of pixels on the horizontal axis of the frame image is larger than T. Therefore, in the frame image, there are not as many black circle pixels x (t) as the number corresponding to all the pixels on the horizontal axis.

　図２４は、動的視差と累積された動的視差との関係を説明するための図である。図２４の左図は、一例として、スライス画像の（１，ｙ′）から（４，ｙ′）に至る横軸（ｔ軸）上の各画素と、フレーム画像の（ｘ（１），ｙ′）から（ｘ（４），ｙ′）に至る横軸（ｘ軸）上の各画素との対応関係を示している。図２４の左図に示す黒丸は、フレーム画像における画素を示しており、隣接する前後の画素の間に間隔が空いている。一方で、スライス画像における各画素は、隣接する画素が連続しているため、前後の画素の間に間隔が生じず、４個の画素が連なった状態となる。図２４の左図には、説明の便宜上、スライス画像の画素は示されていない。 FIG. 24 is a diagram for explaining the relationship between dynamic parallax and accumulated dynamic parallax. The left diagram in FIG. 24 shows, as an example, each pixel on the horizontal axis (t-axis) from (1, y ′) to (4, y ′) of the slice image and (x (1), y of the frame image). The correspondence relationship between each pixel on the horizontal axis (x-axis) from ') to (x (4), y') is shown. Black circles shown in the left diagram of FIG. 24 indicate pixels in the frame image, and there is a space between adjacent pixels. On the other hand, in each pixel in the slice image, since adjacent pixels are continuous, there is no space between the preceding and succeeding pixels, and four pixels are connected. In the left diagram of FIG. 24, pixels of the slice image are not shown for convenience of explanation.

　移動するカメラ２００によって撮影対象物を撮影する場合には、図２４の左図に示すように、一定時間毎に、撮影位置が移動する。図２４の左図に示すフレーム画像の黒丸の画素毎の間隔は、移動する撮影位置の変化量に対応する。撮影位置の変化量は、動的視差に該当する。このため、フレーム画像の画素（黒丸）位置の間隔は、スライス画像で１つの画素が変動するときの、撮影対象物に対する動的視差を示すことになる。 When the object to be photographed is photographed by the moving camera 200, as shown in the left diagram of FIG. 24, the photographing position moves at regular intervals. The interval of each black circle pixel in the frame image shown in the left diagram of FIG. 24 corresponds to the change amount of the moving shooting position. The amount of change in the shooting position corresponds to dynamic parallax. For this reason, the interval between the pixel (black circle) positions in the frame image indicates dynamic parallax with respect to the object to be imaged when one pixel fluctuates in the slice image.

　撮影対象物までの動的視差が、隣接する黒丸の画素の間隔（画素間長）で示されるため、黒丸で示されるフレーム画像の画素の位置は、撮影位置の移動に応じて累積される動的視差を意味することになる。黒丸の間隔がそれぞれ異なっているのは、それぞれの黒丸点に対応するカメラ２００から撮影対象物までの距離が異なっているためである。 Since the dynamic parallax to the object to be imaged is indicated by the interval between adjacent black circle pixels (inter-pixel length), the position of the pixel of the frame image indicated by the black circle is accumulated according to the movement of the imaging position. It means a dynamic parallax. The reason why the intervals between the black circles are different is that the distance from the camera 200 corresponding to each black circle point to the object to be photographed is different.

　また、カメラの正面に存在する撮影対象物までの距離は、撮影位置の移動に伴って変化する。図２４の左図では、一例として、カメラ２００の撮影位置に応じて、フレーム画像の黒丸の画素の位置がｘ（１）からｘ（４）まで変化した場合に、カメラ２００から撮影対象物までの距離がｚｖ１，ｚｖ２，ｚｖ３，ｚｖ４と変化する場合が示されている。 Also, the distance to the shooting target existing in front of the camera changes as the shooting position moves. In the left diagram of FIG. 24, as an example, when the position of the black circle pixel of the frame image changes from x (1) to x (4) according to the shooting position of the camera 200, from the camera 200 to the shooting object. In this case, the distance is changed to zv1, zv2, zv3, zv4.

　また、図２４の右図は、スライス画像の横軸のｔ軸上のｔ＝１，ｔ＝２，ｔ＝３，ｔ＝４の４つの点に対応する、フレーム画像の４つの画素の位置が、ｘ（１）からｘ（４）まで変化した場合に、各黒丸間隔の差をｘ（１）からｘ（４）までの分だけ累積した様子を示した図である。図２４の右図では、ｘ（４）で示す画素位置からｘ（１）で示す画素位置を減算した間隔が横線の長さとして示されている。右図の横線の長さは、近接する各画素（黒丸）の間隔の差を、ｘ（１）からｘ（４）まで累積した長さに該当するため、ｘ（１）からｘ（４）までの動的視差が足し合わされたもの、つまり累積された動的視差に該当する。 24 shows the positions of the four pixels of the frame image corresponding to the four points t = 1, t = 2, t = 3, and t = 4 on the t-axis on the horizontal axis of the slice image. FIG. 8 is a diagram showing a state in which the difference between the black circles is accumulated by the amount from x (1) to x (4) when x (1) changes to x (4). In the right diagram of FIG. 24, an interval obtained by subtracting the pixel position indicated by x (1) from the pixel position indicated by x (4) is shown as the length of the horizontal line. The length of the horizontal line in the right figure corresponds to the length obtained by accumulating the difference between the adjacent pixels (black circles) from x (1) to x (4), so x (1) to x (4) This corresponds to the sum of the dynamic parallaxes up to, that is, the accumulated dynamic parallax.

　ここで注意すべきは、これらの黒丸の点の位置は、先にスライス画像とフレーム画像との動的計画法（ＤＰ）を用いた最適マッチングの結果であるスポッティング点ｘ（Ｔ）から、バックトレースすることによって事後的に求められていることである。この累積された動的視差に対応する物体群（フレーム画像のｘ（１），ｘ（２）およびｘ（３）の画素に写っている撮影対象物が、対応する物体群に該当）に対する、カメラ２００からの距離は、仮想距離と呼べるものである。図２４の右図に視された累積された動的視差は、３つの物体点（フレーム画像のｘ（１），ｘ（２）およびｘ（３）の画素に写っている撮影対象物の画素点）の動的視差（フレーム画像のｘ（１）～ｘ（４）の画素間における動的視差）の合計を示しており、特定の１つの物体点に対応しているものではない。カメラ２００から撮影対象物までのこれら３つの物体点に対応する距離を仮想距離ｚｖと定義する。 It should be noted here that the positions of these black circle points are determined based on the spotting point x (T), which is the result of the optimal matching using the dynamic programming (DP) between the slice image and the frame image. This is what is sought after by tracing. For the object group corresponding to the accumulated dynamic parallax (the shooting target object shown in the x (1), x (2) and x (3) pixels of the frame image corresponds to the corresponding object group) The distance from the camera 200 can be called a virtual distance. The accumulated dynamic parallax seen in the right diagram of FIG. 24 is the pixel of the photographing object shown in the three object points (pixels x (1), x (2) and x (3) of the frame image). Point) dynamic parallax (dynamic parallax between pixels x (1) to x (4) of the frame image), and does not correspond to one specific object point. A distance corresponding to these three object points from the camera 200 to the photographing object is defined as a virtual distance zv.

　仮想距離ｚｖは、図２４の左図に示す３つの黒丸（ｘ（１），ｘ（２），ｘ（３））における撮影対象物までの距離ｚｖ１，ｚｖ２，ｚｖ３に依存する距離と考えることができる。スライス画像の３ピクセル点は３点の物体点に対応している。この３点のピクセルに対する動的視差が累積加算される。動的視差が加算されたものは、３つの物体点の距離ｚｖ１，ｚｖ２，ｚｖ３を加算したものには対応しない。このことは、ステレオビジョン法で類推しても、ステレオビジョン法での３つの視差の加算が、距離を加算したものに対応しないのと同様である。ステレオビジョン法での視差は１つの物体点について得られる。したがって、本実施の形態では、累積された動的視差に対応する距離を仮想距離とする。仮想距離は、距離ｚｖ１，ｚｖ２，ｚｖ３に関係するものとしてのみ意味されるものである。この仮想距離ｚｖは、カメラ２００から撮影対象物までの距離ｚｖ１，ｚｖ２，ｚｖ３に依存する距離であるため、必ずしも現実の距離を直接的に示したものではなく、仮想的なものである。仮想的な距離を現実的な距離へと変換する説明は後述する。 The virtual distance zv is considered as a distance depending on the distances zv1, zv2, and zv3 to the object to be photographed in the three black circles (x (1), x (2), x (3)) shown in the left diagram of FIG. Can do. Three pixel points of the slice image correspond to three object points. The dynamic parallax for these three pixels is cumulatively added. What added dynamic parallax does not correspond to what added distance zv1, zv2, zv3 of three object points. This is the same as the addition of the three parallaxes in the stereo vision method does not correspond to the sum of the distances by analogy with the stereo vision method. The parallax in the stereo vision method is obtained for one object point. Therefore, in the present embodiment, the distance corresponding to the accumulated dynamic parallax is set as the virtual distance. The virtual distance is meant only as being related to the distances zv1, zv2, zv3. The virtual distance zv is a distance that depends on the distances zv1, zv2, and zv3 from the camera 200 to the object to be photographed, and is not necessarily a direct distance but is a virtual one. The explanation for converting the virtual distance into a realistic distance will be described later.

　図２５は、累積された動的視差が実際の距離に対応するかの計算式導出を示すモデルの図である。図２５において、仮想距離ｚｖ（ｔ，ｘ）は、フレーム画像のｘ（ｔ０）からｘ（ｔ）までの累積された動的視差によって求められる距離（仮想距離）である。つまり、この仮想距離ｚｖ（ｔ，ｘ）は、スライス画像の各画素に対応する、フレーム画像の累積された動的視差によって求められる。この仮想距離ｚｖ（ｔ，ｘ）は、領域毎に求められるカメラ２００から撮影対象物までの距離を示したグローバル距離に対応する。図２５では、縦軸をｚ軸に設定している。また、ｘ（ｔ０）からｘ（ｔ）までの累積された動的視差をα（ｔ，ｔ０）とする。 FIG. 25 is a model diagram showing calculation formula derivation of whether the accumulated dynamic parallax corresponds to the actual distance. In FIG. 25, the virtual distance zv (t, x) is a distance (virtual distance) obtained by accumulated dynamic parallax from x (t0) to x (t) of the frame image. That is, the virtual distance zv (t, x) is obtained from the accumulated dynamic parallax of the frame image corresponding to each pixel of the slice image. This virtual distance zv (t, x) corresponds to a global distance indicating the distance from the camera 200 to the object to be photographed obtained for each region. In FIG. 25, the vertical axis is set to the z-axis. Also, let the accumulated dynamic parallax from x (t0) to x (t) be α (t, t0).

　累積された動的視差α（ｔ，ｔ０）は、

　　　　　　　　　　　　　　　　　　　・・・式９
の関係が成立する。ここで、Δｘ（τ）は、τ＝ｔ０からτ＝ｔまでの動的視差を示している。動的視差Δｘ（τ）をτ＝ｔ０からτ＝ｔまで累積することによって、累積された動的視差α（ｔ，ｔ０）に該当することになる。 The accumulated dynamic parallax α (t, t0) is

... Equation 9
The relationship is established. Here, Δx (τ) represents the dynamic parallax from τ = t0 to τ = t. By accumulating the dynamic parallax Δx (τ) from τ = t0 to τ = t, it corresponds to the accumulated dynamic parallax α (t, t0).

　累積された動的視差α（ｔ，ｔ０）の微少な増加量をΔα（ｔ，ｔ０）とすると、Δα（ｔ，ｔ０）は、
　　　　Δα（ｔ，ｔ０）＝α（ｔ＋Δｔ，ｔ０）－α（ｔ，ｔ０）
　で表すことができる。 When a slight increase amount of the accumulated dynamic parallax α (t, t0) is Δα (t, t0), Δα (t, t0) is
Δα (t, t0) = α (t + Δt, t0) −α (t, t0)
Can be expressed as

　いま、累積された動的視差α（ｔ，ｔ０）が、微少量Δα（ｔ，ｔ０）（但し、Δα（ｔ，ｔ０）＞０）だけ増加したとする。このとき、Δα（ｔ，ｔ０）は、ｘ（ｔ＋Δｔ）－ｘ（ｔ）に該当し、フレーム画像の隣接する画素の間隔の微少な変化量に該当する。従って、フレーム画像の隣接する画素の間隔が大きくなると、動的視差が大きくなる。累積された動的視差の現象を考慮すると、動的視差が大きくなることによって、カメラ２００から撮影対象物までの距離が、わずかに近くなる。つまり、撮影対象物までの仮想距離ｚｖ（ｔ，ｘ）の値が、微少量Δｚｖ（ｔ，ｘ）だけ小さくなったと考えることができる。 Now, it is assumed that the accumulated dynamic parallax α (t, t0) has increased by a small amount Δα (t, t0) (where Δα (t, t0)> 0). At this time, Δα (t, t0) corresponds to x (t + Δt) −x (t), and corresponds to a slight change amount of an interval between adjacent pixels of the frame image. Therefore, the dynamic parallax increases as the interval between adjacent pixels in the frame image increases. Considering the accumulated dynamic parallax phenomenon, the distance from the camera 200 to the object to be photographed becomes slightly closer due to the larger dynamic parallax. That is, it can be considered that the value of the virtual distance zv (t, x) to the object to be photographed is reduced by a minute amount Δzv (t, x).

　このように定義されたｚｖ（ｔ，ｘ），－Δｚｖ（ｔ，ｘ），α（ｔ，ｔ０），Δα（ｔ，ｔ０）は、図２５に示す関係図から明らかなように、次式に示す比例関係が成立する。 The zv (t, x), −Δzv (t, x), α (t, t0), and Δα (t, t0) defined in this way are expressed by the following equations, as is clear from the relationship diagram shown in FIG. The proportional relationship shown in FIG.

　　ｚｖ（ｔ，ｘ）：α（ｔ，ｔ０）＝－Δｚｖ（ｔ，ｘ）：Δα（ｔ，ｔ０）
　ここで、仮想距離ｚｖ（ｔ，ｘ）に対応する、累積された動的視差の値を、α（ｔ，ｔ０）＝１とすると、上述した比例関係から、
　　ｚｖ（ｔ，ｘ）：１＝－Δｚｖ（ｔ，ｘ）：Δα（ｔ，ｔ０）
　が成立し、－Δｚｖ（ｔ，ｘ）がΔα（ｔ，ｔ０）に対応すると考えることができる。 zv (t, x): α (t, t0) = − Δzv (t, x): Δα (t, t0)
Here, when the accumulated dynamic parallax value corresponding to the virtual distance zv (t, x) is α (t, t0) = 1, from the above-described proportional relationship,
zv (t, x): 1 = −Δzv (t, x): Δα (t, t0)
Therefore, it can be considered that −Δzv (t, x) corresponds to Δα (t, t0).

　なぜ、上記の比例関係で、α（ｔ，ｔ０）＝１とおくことが必要であるかを述べる。仮想距離ｚｖ（ｔ，ｘ）と、累積された動的視差の値α（ｔ，ｔ０）とは、単なる反比例関係にあるのではない。ステレオビジョン法では、距離と視差の関係は単純な反比例関係である。ステレオビジョン法では、２つのカメラに映る１つの物体点を前提にしている。また、カメラ間距離（ベースライン）も一定である。一方、実施の形態において、ステレオビジョン法での視差に対応する累積された動的視差は、複数の物体点に対応している。さらに、１台の移動するカメラを用いているので、ステレオビジョン法で一定とされる「カメラ間距離」は一定でない。さらに、累積された動的視差の加算は動的計画法（ＤＰ）によって最適に加算されているので、個別の、つまり、１つの物体点が２点の位置のカメラによって対応しているものの単純加算でもなくなっている。これは、変動するベースラインをも考慮して最適加算がされている。以上のことから、仮想距離ｚｖ（ｔ，ｘ）は、ある累積された動的視差の一定値が対応するという仮定を置く必要がある。その前提で、累積距離の変動がこの一定値からの変位をΔα（ｔ，ｔ０）とし、それによる仮想距離ｚｖ（ｔ，ｘ）の変位をΔｚｖ（ｔ，ｘ）と想定することにより、移動する視差の現象を比例関係で表現することが可能となる。この比例関係から微分方程式が導かれ、それを解くと２つの係数をもつ累積された動的視差と距離との関係式が得られるが、この２つの係数は個別の対象において、境界条件を与えることによって定まる。境界条件によって係数が定まった関数は、仮想距離を示すものではなく、実距離を与える関数となる。 Why is it necessary to set α (t, t0) = 1 in the above proportional relationship. The virtual distance zv (t, x) and the accumulated dynamic parallax value α (t, t0) are not simply in inverse proportion. In the stereo vision method, the relationship between distance and parallax is a simple inverse proportion. In the stereo vision method, one object point reflected on two cameras is assumed. The inter-camera distance (baseline) is also constant. On the other hand, in the embodiment, the accumulated dynamic parallax corresponding to the parallax in the stereo vision method corresponds to a plurality of object points. Furthermore, since one moving camera is used, the “inter-camera distance” that is constant in the stereo vision method is not constant. In addition, since the accumulated dynamic parallax addition is optimally added by dynamic programming (DP), it is simple although individual, ie, one object point is supported by a camera at two positions. Even the addition is gone. This is an optimal addition in consideration of a fluctuating baseline. From the above, it is necessary to make the assumption that the virtual distance zv (t, x) corresponds to a certain value of accumulated dynamic parallax. Based on this assumption, the displacement of the accumulated distance is assumed to be Δα (t, t0), and the displacement of the virtual distance zv (t, x) is assumed to be Δzv (t, x). It is possible to express the parallax phenomenon in a proportional relationship. From this proportionality, a differential equation is derived, and solving it yields a relation between accumulated dynamic parallax and distance with two coefficients, which give the boundary condition for individual objects It depends on what. The function whose coefficient is determined by the boundary condition is not a virtual distance but a function that gives an actual distance.

　上述した比例関係から、下記の微分方程式を形成し、解を求めることによって、
　　－Δｚｖ（ｔ，ｘ）＝ｚｖ（ｔ，ｘ）・Δα（ｔ，ｔ０）
　　Δｚｖ（ｔ，ｘ）／ｚｖ（ｔ，ｘ）＝－Δα（ｔ，ｔ０）
　　ｌｏｇｚｖ（ｔ，ｘ）＝－α（ｔ，ｔ０）＋ｃ　（ｃは定数）
　となり、これより、ｚｖ（ｔ，ｘ）は、
　　ｚｖ（ｔ，ｘ）＝ａ・ｅｘｐ（－ｂ・α（ｔ，ｔ０））
　　　　　　　　　　　　　　　　　　　　　　　　　　・・・式１０
　という式で表すことができる。ここで、係数ａ，ｂは別途定まる係数である。 From the proportional relationship described above, by forming the following differential equation and finding the solution,
-Δzv (t, x) = zv (t, x) · Δα (t, t0)
Δzv (t, x) / zv (t, x) = − Δα (t, t0)
log zv (t, x) = − α (t, t0) + c (c is a constant)
From this, zv (t, x) becomes
zv (t, x) = a · exp (−b · α (t, t0))
... Formula 10
It can be expressed by the formula Here, the coefficients a and b are separately determined coefficients.

　係数ａ，ｂが定まる場合、距離関数ｚｖ（ｔ，ｘ）＝ａ・ｅｘｐ（－ｂ・α（ｔ，ｔ０））は、仮想距離を示す仮想距離関数ではなく、実際の距離を示す実距離関数と判断することができる。従って、上述した式１０は、定数ａ、ｂが与えられることによって、実際の距離を、理論的根拠に基づく関数により求めることが可能であると判断することができる。このようにして実距離関数で求められる距離は、既に説明したグローバル距離に該当する。従って、フレーム画像における画素ｘ（ｔ）の属する領域のグローバル距離を距離ｚｇで示すと、画素ｘ（ｔ）におけるグローバル距離ｚｇは、式１０に基づいて、
　　ｚｇ＝ａ・ｅｘｐ（－ｂ・α（ｔ，ｔ０））
　　　　　　　　　　　　　　　　　　　　　　　　　　・・・式１１
　と示すことができる。 When the coefficients a and b are determined, the distance function zv (t, x) = a · exp (−b · α (t, t0)) is not a virtual distance function indicating a virtual distance but an actual distance indicating an actual distance. It can be determined as a function. Therefore, it can be determined that Equation 10 described above can determine the actual distance by a function based on a theoretical basis, given the constants a and b. The distance obtained by the actual distance function in this way corresponds to the global distance already described. Therefore, when the global distance of the region to which the pixel x (t) belongs in the frame image is represented by the distance zg, the global distance zg in the pixel x (t)
zg = a · exp (−b · α (t, t0))
... Formula 11
Can be shown.

　ここで、上述した式１０および式１１により距離を求める場合に問題となるのは、動的視差の累積方法である。つまり、上述した式９の加算区間である[ｔ０，ｔ]（ｔ０からｔの範囲）を、どのように定めるかが問題となる。 Here, what is a problem when the distance is obtained by the above-described Expression 10 and Expression 11 is the dynamic parallax accumulation method. That is, how to determine [t0, t] (range from t0 to t), which is the addition section of the above-described Expression 9, becomes a problem.

　既に実施の形態において説明した方法では、スライス画像とフレーム画像との双方に対して、領域分割手法であるmean-shift法を適用して、それぞれの画像に対応する領域を求めて、求められたそれぞれの領域によって、加算区間を定めた。 In the method already described in the embodiment, the mean-shift method, which is a region dividing method, is applied to both the slice image and the frame image to obtain the region corresponding to each image. An addition interval was defined for each area.

　図２６は、既に説明した方法である、フレーム画像の対応する領域ｒの横軸方向の平均長と、スライス画像の対応する領域ｒの横軸方向の平均長とを用いて、領域毎に撮影対象物までの距離ｚ_region（ｒ）を算出する方法を説明するための図である。図２６において、ｚ_region(ｒ)は、フレーム画像の領域ｒにおいて求められる、撮影対象物までの距離を示している。フレーム画像の領域ｒに含まれる区間横線の数（領域ｒにおいて一方の端点から他方の端点までを結ぶ区間横線が存在する数。縦に並ぶ区間横線のライン数）をＬ_１とし、スライス画像の領域ｒに含まれる区間横線の数をＬ_２とする。 FIG. 26 is a method for which each region is imaged using the average length in the horizontal axis direction of the corresponding region r of the frame image and the average length in the horizontal axis direction of the corresponding region r of the slice image. It is a figure for demonstrating the method of calculating distance _zregion (r) to a target object. In FIG. 26, z _region (r) indicates the distance to the object to be photographed, which is obtained in the region r of the frame image. The number of sections horizontal lines included in the region r of the frame images (number. The number of line range horizontal lines arranged in the vertical of the section connecting from one endpoint to the other endpoint horizontal line is present in the region r) and L _1, the slice image the number of sections horizontal lines included in the region r and L _2.

　ここで、フレーム画像の領域ｒにおける一方の端点から他方の端点までの画素間の長さ（区間横線の長さ）の平均を、ｘａ^r _max-minとする。また、フレーム画像の領域ｒにおけるｉ番目の区間横線の一端の画素位置を、ｘ^ｒ _i,minとし、他端の画素位置をｘ^ｒ _i,maxとすると、フレーム画像の領域ｒにおけるｘａ^r _max-minは、

で表すことができる。 Here, the average length between pixels from one end point in the region r of the frame image to the other endpoint (the length of the section horizontal lines), and xa ^r _max-min. Further, the pixel position of one end of the i th interval horizontal line in the region r of the frame image, x r ^i, _min, and when the pixel position of the other end x ^{r _i,} and _max, xa ^r _max in the region r of the frame image _-min

Can be expressed as

　また、スライス画像の領域ｒにおける一方の端点から他方の端点までの画素間の長さ（区間横線の長さ）の平均を、ｔａ^r _max-minとする。さらに、スライス画像の領域ｒにおけるｉ番目の区間横線の一端の画素位置を、ｔ^ｒ _i,minとし、他端の画素位置をｔ^ｒ _i,maxとすると、スライス画像の領域ｒにおけるｔａ^r _max-minは、

で表すことができる。 Moreover, an average length between pixels from one end point in the region r of the slice images to the other endpoint (the length of the section horizontal lines), and ta ^r _max-min. Further, the pixel position of one end of the i th interval horizontal line in the region r of the slice image, t r ^i, _min, and when the pixel position of the other end t ^{r _i,} and _max, ta ^r _max in the region r slice image _-min

Can be expressed as

　カメラ２００から撮影対象物までの距離を求める場合には、まず、スライス画像の一の領域ｒの横軸方向の平均長をｐとし、フレーム画像の対応する領域ｒの横軸方向の平均長をｑとして、スライス画像に対するフレーム画像の拡大率α_ｒ＝ｑ／ｐを求める。そして、領域毎に求められるカメラ２００から撮影対象物までの距離を、図１３（ｂ）に示した距離ｚと累積された動的視差α_ｒの関係式を用いて、α_ｒ＝ｑ／ｐの値から求める。 When obtaining the distance from the camera 200 to the object to be photographed, first, the average length in the horizontal axis direction of one region r of the slice image is set to p, and the average length in the horizontal axis direction of the corresponding region r of the frame image is determined. As q, an enlargement ratio α _r = q / p of the frame image with respect to the slice image is obtained. Then, the distance from the camera 200 to the object to be photographed obtained for each region is expressed as α _r = q / p using the relational expression of the distance z and the accumulated dynamic parallax α _r shown in FIG. Obtained from the value of.

　つまり、上述した方法では、スライス画像の領域ｒにおける区間横線の画素間の長さの平均に対する、フレーム画像の領域ｒにおける区間横線の画素間の長さの平均に基づいて、距離を求めている。このことから、「フレーム画像の領域ｒにおける区間横線の長さの平均」を「スライス画像の領域ｒにおける区間横線の長さの平均」で割った値をα_rとして、
　　α_ｒ＝ｘａ^r _max-min／ｔａ^r _max-min　　　　　　　　　　・・・式１２
と表すことができる。 That is, in the above-described method, the distance is obtained based on the average length between the pixels on the section horizontal line in the frame image region r with respect to the average length between the pixels on the section horizontal line in the slice image region r. . From this, a value obtained by dividing “average length of section horizontal lines in region r of frame image” by “average length of section horizontal lines in region r of slice image” is α _r ,
_{^{_{α r = xa r max-min}}} / ta r max-min ··· formula 12
It can be expressed as.

　つまり、このα_rが、領域ｒにおけるカメラ２００から撮影対象物までの距離ｚｇに対応する、累積された動的視差の値αと考えて、グローバル距離を算出した。この考え方から、上述した距離ｚｖ（ｔ，ｘ）における累積された動的視差α（ｔ，ｔ０）を、式１２に示すα_rに該当するものとして、係数ａ，ｂを決定する場合を考える。 That is, the global distance was calculated by regarding α _{r as} the accumulated dynamic parallax value α corresponding to the distance zg from the camera 200 to the photographing object in the region r. From this concept, a case is considered in which the coefficients a and b are determined on the assumption that the accumulated dynamic parallax α (t, t0) at the distance zv (t, x) described above corresponds to α _r shown in Expression 12. .

　係数ａおよび係数ｂを決定する場合には、まず、距離ｚ_region(ｒ)と、α_rとの変動区間を定める必要がある。距離ｚ_region(ｒ)の変動区間とは、カメラ２００から撮影対象物までの距離の変動区間である。距離ｚ_region(ｒ)の変動区間は、都市や街の風景や室内の状況等、カメラ２００により撮影されるフレーム画像の景色を、実際に人間が見て、直感的に定める。変動区間の手前側の距離をｚ_Ｎ１とし、変動区間の遠方側の距離をｚ_Ｌ１とすると、距離ｚ_region(ｒ)の変動区間は、ｚ_Ｎ１≦ｚ_region(ｒ)≦ｚ_Ｌ１と示すことができる。 When determining the coefficient a and the coefficient b, first, it is necessary to determine a fluctuation interval between the distance z _region (r) and α _r . The fluctuation zone of the distance z _region (r) is a fluctuation zone of the distance from the camera 200 to the photographing object. The fluctuation zone of the distance z _region (r) is determined intuitively by actually seeing the scenery of the frame image taken by the camera 200, such as the scenery of the city or city, the indoor situation, and the like. When the distance on the near side of the fluctuation section is z _N1 and the distance on the far side of the fluctuation section is z _L1 , the fluctuation section of the distance z _region (r) is expressed as z _N1 ≦ z _region (r) ≦ z _L1. Can do.

　例えば、撮影風景が都市の風景であって、人間によってカメラ２００から手前の撮影対象物までの距離が１０ｍで、遠方の撮影対象物の距離が４ｋｍと判断された場合には、距離ｚ_region(ｒ)の変動区間は、[ｚ_Ｎ１，ｚ_Ｌ１]＝［１０ｍ，４ｋｍ］となる。もちろん、可能であれば、レーザーを利用した距離測定装置等を用いて、直接的に撮影対象物までの距離を測定して、変動範囲を決定することも可能である。 For example, if the shooting scene is an urban scene, the distance from the camera 200 to the front shooting target is 10 m and the distance of the far shooting target is 4 km by a human, the distance z _region ( The variation interval of r) is [z _N1 , z _L1 ] = [10 m, 4 km]. Of course, if possible, the range of variation can be determined by directly measuring the distance to the object to be photographed using a distance measuring device using a laser or the like.

　また、α_rの変動区間は、定数μ_１と定数γ_１とを用いて、μ_１≦α_r≦γ_１と表すことができる。上述したように、α_rは、「フレーム画像の領域ｒにおける区間横線の画素間の長さ」を「スライス画像の領域ｒにおける区間横線の画素間の長さ」で割った値である。このため、α_rの変動区間は、実施の形態において既に説明したように、スライス画像からフレーム画像への伸縮率等に影響を受けることになり、α_rの値は１＜α_r＜４に設定される。従って、α_rの変動区間は、１＜μ_１≦α_r≦γ_１＜４となる。 The variation interval of alpha _r can be by using the constant mu ₁ and constant gamma _1, expressed as _{_{_{μ 1 ≦ α r ≦ γ 1}}} . As described above, α _r is a value obtained by dividing “the length between pixels of the section horizontal line in the region r of the frame image” by “the length between pixels of the section horizontal line in the region r of the slice image”. Therefore, variation interval of alpha _r, as already described in the embodiment, will be affected by the expansion ratio and the like from the slice images to the frame image, the value of alpha _r is 1 <α _r <4 Is set. Therefore, the fluctuation interval of α _r is 1 <μ ₁ ≦ α _r ≦ γ ₁ <4.

　上述したように、理論的に求められた仮想距離の距離関数
　　ｚｖ（ｔ，ｘ）＝ａ・ｅｘｐ（－ｂ・α（ｔ，ｔ０））　・・・式１０
における、２つの係数ａ，ｂを、上述したｚ_region(ｒ)とα_rとの変動区間のパラメータを用いて定める。ここで、距離ｚ_region(ｒ)の最小の区間値であるｚ_Ｎ１は、α_rの最大の区間値であるγ_１に対応し、距離ｚ_region(ｒ)の最大の区間値であるｚ_Ｌ１は、α_rの最小の区間値であるμ_１に対応する。この対応関係は、累積された動的視差の現象を考慮すれば妥当であると判断できる。α_rの値が大きい場合には、フレーム画像の隣り合う画素の間隔が広くなり、区間横線の画素間の長さの平均ｘａ^r _max-minが長くなるため、撮影対象物までの距離が近くなって、ｚ_region(ｒ)の値が小さな値となるからである。一方で、α_rの値が小さい場合には、フレーム画像の隣り合う画素の間隔が狭く、区間横線の画素間の長さの平均ｘａ^r _max-minが短くなるため、撮影対象物までの距離が遠くなって、ｚ_region(ｒ)の値が大きな値となるからである。 As described above, the distance function of the virtual distance obtained theoretically zv (t, x) = a · exp (−b · α (t, t0)) Equation 10
The two coefficients a and b are determined using the parameters of the fluctuation _{region of} z _region (r) and α _r described above. Here, the distance z z _N1 is the minimum interval value _region (r) is, alpha corresponds to the maximum of the gamma ₁ is an interval value of _r, the distance z _region z is the maximum interval value (r) _L1 Corresponds to μ ₁ which is the minimum interval value of α _r . This correspondence can be determined to be appropriate in consideration of the accumulated dynamic parallax phenomenon. If the value of alpha _r is large, the distance between adjacent pixels of the frame image becomes wide, because the average length xa ^r _max-min between the pixels of the section horizontal longer, the distance to the imaging object is close This is because the value of z _region (r) becomes a small value. On the other hand, when the value of alpha _r is small, closely spaced adjacent pixels of the frame image, the average length xa ^r _max-min between the pixels of the section horizontal lines becomes shorter, the distance to the object to be shot This is because the value of z _region (r) becomes a large value.

　従って、係数ａ，ｂを定めるためには、２つの方程式、
　　　　　ｚ_Ｌ１＝ａ・ｅｘｐ（－ｂμ_１）
　　　　　ｚ_Ｎ１＝ａ・ｅｘｐ（－ｂγ_１）
を用いて、係数ａと係数ｂとを求めればよい。 Thus, to determine the coefficients a and b, two equations,
z _L1 = a · exp (−bμ ₁ )
z _N1 = a · exp (−bγ ₁ )
The coefficient a and the coefficient b may be obtained using

　従って、ｚ_Ｎ１の値と、ｚ_Ｌ１の値と、μ_１の値と、γ_１の値とを設定することにより、上述したｚ_Ｎ１の式とｚ_Ｌ１の式との２式に基づいて、係数ａと係数ｂとを求めると、係数ａと係数ｂとは、
　ａ＝ｚ_Ｌ１・ｅｘｐ（（μ_１／（γ_１－μ_１））ｌｏｇ（ｚ_Ｌ１／ｚ_Ｎ１）
　ｂ＝（１／（γ_１－μ_１））ｌｏｇ（ｚ_Ｌ１／ｚ_Ｎ１）
　となる。 Therefore, by setting the value of z _N1, the value of z _L1 , the value of μ ₁ , and the value of γ ₁ , based on the above two formulas, the formula of z _{N1 and} the formula of z _L1 , When the coefficient a and the coefficient b are obtained, the coefficient a and the coefficient b are
a = z _L1 · exp ((μ ₁ / (γ ₁ −μ ₁ )) log (z _L1 / z _N1 )
b = (1 / (γ ₁ −μ ₁ )) log (z _L1 / z _N1 )
It becomes.

　このようにして求められた係数ａと係数ｂとを用いて、画素ｘ（ｔ）における距離ｚｖ（ｔ，ｘ）を
　　ｚｖ（ｔ，ｘ）＝ａ・ｅｘｐ（－ｂ・α（ｔ，ｔ０））　・・・式１０
によって求めることにより、領域毎の距離（グローバル距離ｚｇ）の値を求めるための実距離関数
　　ｚｇ＝ａ・ｅｘｐ（－ｂ・α（ｔ，ｔ０））　　・・・式１１
を算出することができる。この実距離関数は、上述したように数学的に求められるものである。従って、実距離関数を用いることによって、グローバル距離を、人間による撮影対象物の観察や直感ではない、理論根拠に基づいて決定することが可能となる。 Using the coefficient a and coefficient b thus determined, the distance zv (t, x) at the pixel x (t) is expressed as zv (t, x) = a · exp (−b · α (t, t0) )) Equation 10
To obtain the value of the distance for each region (global distance zg) zg = a · exp (−b · α (t, t0)) Equation 11
Can be calculated. This actual distance function is obtained mathematically as described above. Therefore, by using the real distance function, it is possible to determine the global distance based on a theoretical basis that is not the observation or intuition of a photographing object by a human.

　また、実距離関数の距離ｚｇを算出するための係数ａおよび係数ｂは、上述したように、ｚ_Ｎ１の値と、ｚ_Ｌ１の値と、μ_１の値と、γ_１の値とを設定することにより求められる。このｚ_Ｎ１の値と、ｚ_Ｌ１の値とは、式１１が実距離関数と判断されることから、結果的に、画素ｘ（ｔ）における式１１の距離ｚｇの変動範囲に対応すると考えられる。また、同様に、μ_１の値と、γ_１の値とは、画素ｘ（ｔ）における式１１の累積された動的視差α（ｔ，ｔ０）の変動範囲に対応すると考えられる。 Further, coefficients a and b for calculating the distance zg real distance function, as described above, setting the value of z _N1, the value of z _L1, the value of mu _1, the value of gamma ₁ Is required. The value of this z _N1, the value of z _L1, believed Formula 11 from being determined that actual distance function, consequently, corresponding to the fluctuation range of the distance zg of formula 11 in the pixel x (t) . Similarly, the value of μ _{1 and} the value of γ ₁ are considered to correspond to the fluctuation range of the accumulated dynamic parallax α (t, t0) of Expression 11 in the pixel x (t).

　また、同じ撮影対象物をカメラ２００で撮影する場合であっても、カメラ２００の移動速度によって、α_rの区間パラメータであるμ_１およびγ_１の値が変動する。図２７（ａ）（ｂ）は、領域ｒにおける距離ｚとα_ｒとの関係を、ｚ_Ｎ１の値と、ｚ_Ｌ１の値と、μ_１の値と、γ_１の値とを用いて図示したグラフである。カメラ２００の移動速度が遅い場合には、図２７（ａ）に示すように、μ_１からγ_１までの範囲が全体的に１側寄りの範囲となり、カメラの移動速度が速い場合には、図２７（ｂ）に示すように、μ_１からγ_１までの範囲が全体的に４側寄りの範囲となる。このように、μ_１からγ_１までの範囲を変化させることによって、α_ｒに対する距離ｚの値が変化することになる。但し、これらの距離の変化は、実距離関数で吸収されるものである。 Even when the same object to be photographed is photographed by the camera 200, the values of μ ₁ and γ ₁ that are the section parameters of α _r vary depending on the moving speed of the camera 200. Figure 27 (a) (b) shows the relationship between the distance z and alpha _r in the region _r, using the value of _{z _N1,} the value of _{z L1,} the value of mu _1, the value of the gamma ₁ shown It is a graph. When the moving speed of the camera 200 is slow, as shown in FIG. 27 (a), the range from μ ₁ to γ _{1 is} an area closer to one side as a whole, and when the moving speed of the camera is fast, As shown in FIG. 27 (b), the range from μ ₁ to γ _{1 is} a range closer to the 4 side as a whole. Thus, by changing the range from μ ₁ to γ ₁ , the value of the distance z with respect to α _r changes. However, these changes in distance are absorbed by the actual distance function.

　実施の形態に示したように、領域分割手法であるmean-shift法を用いて、フレーム画像の領域毎にグローバル距離を算出する場合には、フレーム画像の領域毎に距離値が一定となってしまうが、上述した実距離関数を用いることにより、スライス画像の各画素に対応するフレーム画像の画素毎に、カメラ２００から撮影対象物までの距離を求めることが可能になる。 As shown in the embodiment, when the global distance is calculated for each area of the frame image using the mean-shift method that is an area dividing method, the distance value is constant for each area of the frame image. However, by using the above-described actual distance function, the distance from the camera 200 to the photographing object can be obtained for each pixel of the frame image corresponding to each pixel of the slice image.

　このことは、テクスチャのある画像（撮影対象として物体の表面状態が示された画像）の画素毎に、距離値を求められることを意味している。つまり、距離値の求められた画素を用いることによって、３次元画像へのテクスチャマッピングが容易となる。 This means that a distance value can be obtained for each pixel of a textured image (an image showing the surface state of an object as a photographing target). That is, texture mapping to a three-dimensional image is facilitated by using a pixel whose distance value has been obtained.

　従来の３次元画像へのテクスチャマッピングの考え方では、物体が存在する３次元空間（free spaceと呼ばれる）を設定し、その空間の点に物体（対象物）の一点が存在するとされていた。このため、得られた物体点集合に対してどのようにテクスチャを貼る（設定する）かが大きな問題とされていた。しかしながら、フレーム画像の画素に距離値（距離情報）が付加された画像を用いることによって、画素に付加された距離値を利用してテクスチャを貼ることができるため、このような問題を考える必要がない。 In the conventional concept of texture mapping to a three-dimensional image, a three-dimensional space (called free space) where an object exists is set, and one point of the object (target object) exists at a point in the space. For this reason, it has been a big problem how to apply (set) a texture to the obtained object point set. However, by using an image in which a distance value (distance information) is added to the pixel of the frame image, it is possible to apply a texture using the distance value added to the pixel, so it is necessary to consider such a problem. Absent.

　また、画素毎に距離値（距離情報）が付加されたフレーム画像を、既に説明したスティッチングアルゴリズムを用いて貼り合わせることによって、１枚の貼り合わせ画像を生成することができる。そして、貼り合わせ画像に基づいて画素毎の距離値を求めることによって、エンドレスな繋がりをもつ、広域の３次元画像を得ることが可能になる。 Also, a single combined image can be generated by combining the frame images to which the distance value (distance information) is added for each pixel using the stitching algorithm described above. Then, by obtaining the distance value for each pixel based on the combined image, it is possible to obtain a wide-area three-dimensional image having endless connection.

　［フレーム画像の対応領域内における画素毎の距離算出］
　また、実施の形態では、領域毎にグローバル距離を求めた後に、領域における相対的な距離を示したローカル距離を求めて、グローバル距離に対してローカル距離を加算することによって、フレーム画像の画素毎に、カメラ２００から撮影対象物までの距離を求める場合について説明した。しかしながら、フレーム画像の領域毎に距離値が決まった後で、異なる方法によって、領域内の個々の画素毎に、カメラ２００から撮影対象物までの距離を求めることも可能である。 [Distance calculation for each pixel in the corresponding area of the frame image]
Further, in the embodiment, after obtaining the global distance for each area, the local distance indicating the relative distance in the area is obtained, and the local distance is added to the global distance, so that each pixel of the frame image is obtained. In the above description, the distance from the camera 200 to the object to be photographed is described. However, after the distance value is determined for each region of the frame image, the distance from the camera 200 to the photographing object can be obtained for each individual pixel in the region by a different method.

　フレーム画像は、カメラ２００によって撮影された動画の１フレーム分の画像を抽出したものであるため、フレーム画像の解像度は、カメラの撮影性能に依存することになる。一般的な動画撮影用のカメラでは、例えば、１０００×６００程度の画素数、あるいは、４０００×２０００程度の画素数で、画素毎にＲＧＢの値からなる色情報が記録される。従って、これだけ多数の画素によって構成されるフレーム画像において、領域毎の距離情報としてグローバル距離が画素毎に付加されるだけでは、フレーム画像の全体の距離精度として十分ではない。領域に含まれる全ての画素に対して、原理的に異なる距離値が付加されることが望まれており、実世界の表現として意味が高まるものである。このため、以下に、上述した領域内の画素毎の距離計算を、より細かいレベルで算出する方法について説明する。 Since the frame image is obtained by extracting an image of one frame of a moving image shot by the camera 200, the resolution of the frame image depends on the shooting performance of the camera. In a general video camera, for example, color information composed of RGB values is recorded for each pixel with the number of pixels of about 1000 × 600 or the number of pixels of about 4000 × 2000. Accordingly, in a frame image composed of such a large number of pixels, it is not sufficient as the overall distance accuracy of the frame image if the global distance is added to each pixel as distance information for each region. In principle, different distance values are desired to be added to all the pixels included in the region, which increases the meaning of the real world expression. Therefore, a method for calculating the distance calculation for each pixel in the above-described area at a finer level will be described below.

　既に説明したグローバル距離の算出方法（領域分割手法であるmean-shift法を用いた距離算出方法）によって、分割された領域毎の距離については求めることができる。領域ｒについて得られたグローバル距離を距離ｚｇとする。また、領域ｒには何本かの区間横線が含まれている。それぞれの区間横線の横軸上には、既に説明したように、領域内における両端固定のマッチング処理およびバックトレース処理によって求められた座標点が複数存在し、横軸上に点列となって記録されている。このバックトレース処理によって求められる複数の点を、ｘ（１），ｘ（２），ｘ（３），・・・，ｘ（ｉ－１），ｘ（ｉ），・・・，ｘ（Ｇ）とする。また、領域ｒに含まれる区間横線の画素単位の平均長を、ｘａとする。さらに、バックトレース処理で求められた複数の点のうち近接する２つの点を、ｘ（ｉ－１），ｘ（ｉ）とする。但し、ｉは、２≦ｉ≦Ｇの整数となる。また、近接する画素ｘ（ｉ）と画素ｘ（ｉ－１）との距離（画素差）は、ｘ（ｉ）－ｘ（ｉ－１）と表すことができる。 The distance for each divided region can be obtained by the global distance calculation method described above (the distance calculation method using the mean-shift method, which is a region division method). Let the global distance obtained for the region r be the distance zg. Further, the region r includes several section horizontal lines. On the horizontal axis of each section horizontal line, as already explained, there are a plurality of coordinate points obtained by matching processing and fixed back-end processing in the area, and recorded as a point sequence on the horizontal axis. Has been. A plurality of points obtained by this backtrace processing are expressed as x (1), x (2), x (3),..., X (i−1), x (i),. ). Further, the average length of the pixel unit of the section horizontal line included in the region r is assumed to be xa. Further, two adjacent points among a plurality of points obtained by the backtrace process are assumed to be x (i−1) and x (i). However, i is an integer of 2 ≦ i ≦ G. Further, the distance (pixel difference) between adjacent pixels x (i) and x (i−1) can be expressed as x (i) −x (i−1).

　このように設定される、区間横線の平均長ｘａと、近接する２点間の距離ｘ（ｉ）－ｘ（ｉ－１）と、バックトレース処理で求められた座標の数Ｇとを用いると、画素ｘ（ｉ）におけるカメラ２００から撮影対象物までの詳細な距離ｚ（ｉ）は、
　ｚ（ｉ）＝ｚｇ＋β（ｘ（ｉ）－ｘ（ｉ－１）－ｘａ／Ｇ）
　　　　　　　　　　　　　　　　　　　　　　　　　　　・・・式１３
によって決定される。なお、βは正の定数であり、実験的に定められる値である。 Using the average length xa of the section horizontal line, the distance x (i) −x (i−1) between two adjacent points, and the number of coordinates G obtained by the backtrace processing, which are set in this way, , The detailed distance z (i) from the camera 200 to the object to be photographed at the pixel x (i) is
z (i) = zg + β (x (i) −x (i−1) −xa / G)
... Formula 13
Determined by. Β is a positive constant and is a value determined experimentally.

　また、ｘａは、領域ｒに含まれる区間横線の画素単位の平均長を示しているため、ｘａ／Ｇは、領域内のｘ（１），ｘ（２），ｘ（３），・・・，ｘ（ｉ－１），ｘ（ｉ），・・・，ｘ（Ｇ）点の近接する２点間の平均的な画素長（画素間の距離、座標位置の差）を示している。より具体的には、領域内の横軸上にＧ箇所存在する複数の画素点において、ｘ（１）からｘ（Ｇ）までの全ての点列の画素長（ピクセル長）をＧで割った平均値、つまり、近接する２点間の平均的な区間画素長を示している。 In addition, since xa indicates the average length of the pixel unit of the section horizontal line included in the region r, xa / G indicates x (1), x (2), x (3),. , X (i−1), x (i),..., X (G), the average pixel length (distance between pixels, difference in coordinate position) between two adjacent points. More specifically, the pixel lengths (pixel lengths) of all point sequences from x (1) to x (G) are divided by G at a plurality of pixel points existing on G on the horizontal axis in the region. The average value, that is, the average section pixel length between two adjacent points is shown.

　ここで、領域ｒにおけるグローバル距離ｚｇは、領域ｒの平均的な距離であると考えられ、この平均的な距離ｚｇは、近接する２つの画素間の平均的な区間画素長に対応するものと考えられる。このことから、領域ｒ内のｘ（ｉ）の画素位置において、ｘ（ｉ）の画素位置からｘ（ｉ－１）の画素位置までの２点間の画素長が、平均的な２点間の画素長よりも長い場合、すなわち、ｘ（ｉ）－ｘ（ｉ－１）がｘａ／Ｇより大きい（ｘ（ｉ）－ｘ（ｉ－１）－ｘａ／Ｇ＞０）場合には、画素ｘ（ｉ）における距離ｚ（ｉ）が、領域ｒにおける平均的な距離ｚｇよりも手前側（カメラ２００に近い位置）の撮影対象物を写した画素に該当すると考えることができる。 Here, the global distance zg in the region r is considered to be an average distance of the region r, and this average distance zg corresponds to an average section pixel length between two adjacent pixels. Conceivable. From this, at the pixel position x (i) in the region r, the pixel length between two points from the pixel position x (i) to the pixel position x (i−1) is the average between the two points. If x (i) −x (i−1) is larger than xa / G (x (i) −x (i−1) −xa / G> 0), It can be considered that the distance z (i) in the pixel x (i) corresponds to a pixel in which the object to be photographed is located on the near side (position closer to the camera 200) than the average distance zg in the region r.

　一方で、領域ｒ内のｘ（ｉ）の画素位置において、ｘ（ｉ）の画素位置から近接するｘ（ｉ－１）の画素位置までの２点間の画素長が、平均的な２点間の画素長よりも短い場合、すなわち、ｘ（ｉ）－ｘ（ｉ－１）がｘａ／Ｇより小さい（ｘ（ｉ）－ｘ（ｉ－１）－ｘａ／Ｇ＜０）場合には、点ｘ（ｉ）における距離ｚ（ｉ）が、領域ｒにおける平均的な距離ｚｇよりも奥側（カメラ２００から遠い位置）の撮影対象物を写した画素に該当すると考えることができる。 On the other hand, at the pixel position x (i) in the region r, the pixel length between two points from the pixel position x (i) to the adjacent pixel position x (i−1) is an average of two points. When x (i) -x (i-1) is smaller than xa / G (x (i) -x (i-1) -xa / G <0) It can be considered that the distance z (i) at the point x (i) corresponds to a pixel in which the object to be photographed is located on the back side (position far from the camera 200) than the average distance zg in the region r.

　図２８は、領域内のｉ番目の画素ｘ（ｉ）と、画素ｘ（ｉ）における距離ｚ（ｉ）との関係を示した図である。ｉ番目の画素ｘ（ｉ）の距離値ｚ（ｉ）は、上述したように、
　ｚ（ｉ）＝ｚｇ＋β（ｘ（ｉ）－ｘ（ｉ－１）－ｘａ／Ｇ）
　　　　　　　　　　　　　　　　　　　　　　　　　　　・・・式１３
によって求められる。従って、ｉ番目の画素ｘ（ｉ）の距離値ｚ（ｉ）が、領域ｒのグローバル距離ｚｇに一致する場合には、上述したｘ（ｉ）－ｘ（ｉ－１）－ｘａ／Ｇの値がゼロの値となる。つまり、ｘ（ｉ）－ｘ（ｉ－１）－ｘａ／Ｇ＝０に該当する画素ｘ（ｉ）の距離ｚ（ｉ）は、距離ｚｇとなる。一方で、ｘ（ｉ）－ｘ（ｉ－１）－ｘａ／Ｇ＜０が成立する画素ｘ（ｉ）では、画素ｘ（ｉ）の距離ｚ（ｉ）が、距離ｚｇよりも短い距離となる。また、ｘ（ｉ）－ｘ（ｉ－１）－ｘａ／Ｇ＞０が成立する画素ｘ（ｉ）では、画素ｘ（ｉ）の距離ｚ（ｉ）が、距離ｚｇよりも長い距離となる。 FIG. 28 is a diagram illustrating the relationship between the i-th pixel x (i) in the region and the distance z (i) in the pixel x (i). As described above, the distance value z (i) of the i-th pixel x (i) is
z (i) = zg + β (x (i) −x (i−1) −xa / G)
... Formula 13
Sought by. Therefore, when the distance value z (i) of the i-th pixel x (i) matches the global distance zg of the region r, x (i) −x (i−1) −xa / G The value is zero. That is, the distance z (i) of the pixel x (i) corresponding to x (i) −x (i−1) −xa / G = 0 is the distance zg. On the other hand, in the pixel x (i) where x (i) −x (i−1) −xa / G <0 holds, the distance z (i) of the pixel x (i) is shorter than the distance zg. Become. Further, in the pixel x (i) where x (i) −x (i−1) −xa / G> 0 holds, the distance z (i) of the pixel x (i) is longer than the distance zg. .

　このように、ｘ（ｉ）－ｘ（ｉ－１）－ｘａ／Ｇを求めることにより、領域ｒの平均的な距離と判断できるグローバル距離ｚｇを基準として、領域ｒ内の画素ｘ（ｉ）における詳細な距離ｚ（ｉ）を求めることが可能になる。 Thus, by obtaining x (i) −x (i−1) −xa / G, the global distance zg that can be determined as the average distance of the region r is used as a reference, and the pixel x (i) in the region r The detailed distance z (i) at can be obtained.

　［フレーム画像における各画素の詳細距離を直接求める方法］
　また、実施の形態では、初めに、mean-shift法により領域分割された領域毎の距離（グローバル距離）を求めて、その後に、領域内の画素毎の相対的な距離（ローカル距離）を求めた。そして、領域毎の距離（グローバル距離）に対して、領域内の画素毎の相対的な距離（ローカル距離）を加算することによって、フレーム画像の画素毎に詳細な距離を求める方法について説明した。つまり、第１段階の処理として、領域毎のグローバル距離を求めて、第２段階の処理として、領域内の画素毎の相対的な距離（ローカル距離）を求めてから、最終的な画素毎の詳細な距離を求めていた。しかしながら、このような多段的な処理により、フレーム画像の画素毎に詳細な距離を求めるのではなく、メディアンフィルタを用いて、１回の処理でフレーム画像の画素毎に詳細な距離を求める方法を用いることも可能である。言い換えると、メディアンフィルタのウィンドウ・サイズがmean-shift法で得られる領域に対応している。メディアンフィルタを用いる方法は、より簡単に距離を求める方法である。 [Method for directly obtaining the detailed distance of each pixel in a frame image]
In the embodiment, first, a distance for each area (global distance) divided by the mean-shift method is obtained, and then a relative distance (local distance) for each pixel in the area is obtained. It was. And the method of calculating | requiring a detailed distance for every pixel of a frame image by adding the relative distance (local distance) for every pixel in an area | region with respect to the distance (global distance) for every area | region was demonstrated. That is, as the first stage processing, the global distance for each region is obtained, and as the second stage processing, the relative distance (local distance) for each pixel in the region is obtained, and then the final pixel-by-pixel is obtained. We wanted a detailed distance. However, instead of obtaining a detailed distance for each pixel of the frame image by such multi-stage processing, a method for obtaining a detailed distance for each pixel of the frame image in one process using a median filter. It is also possible to use it. In other words, the median filter window size corresponds to the region obtained by the mean-shift method. The method using the median filter is a method for obtaining the distance more easily.

　図２３は、既に説明したように、スライス画像の横軸（ｔ軸）上の各画素と、この各画素に対応するフレーム画像の横軸（ｘ軸）上の各画素との対応関係を模式的に示した図である。図２３では、スライス画像の縦軸上の一点ｙ′を固定点（すなわち、（ｘ＝１，ｙ＝ｙ′））とし、このｙ＝ｙ′となるスライス画像の横軸上の点、つまり、横軸ｔ上のｔ＝１からｔ＝Ｔまでの各点が黒丸で示されている。そして、スライス画像の横軸ｔ上の各点に対応するフレーム画像の画素であって、ｙ＝ｙ′で固定したフレーム画像のｘ軸において最適にマッチングする画素が、ライン対画像による連続動的計画法によって求められる。図２３においてフレーム画像に示された黒丸は、連続動的計画法によって求められる画素である。また、フレーム画像に示された黒丸の画素が、例えば、ｘ（ｉ）の位置にあるとき、ｘ（１）＝１からｘ（ｉ）まで累積された動的視差は、ｘ（ｉ）－ｘ（１）に該当する。 FIG. 23 schematically illustrates the correspondence between each pixel on the horizontal axis (t-axis) of the slice image and each pixel on the horizontal axis (x-axis) of the frame image corresponding to each pixel, as already described. FIG. In FIG. 23, a point y ′ on the vertical axis of the slice image is a fixed point (that is, (x = 1, y = y ′)), and a point on the horizontal axis of the slice image where y = y ′, Each point from t = 1 to t = T on the horizontal axis t is indicated by a black circle. A pixel of the frame image corresponding to each point on the horizontal axis t of the slice image, and a pixel optimally matched on the x axis of the frame image fixed at y = y ′ Required by the planning method. The black circles shown in the frame image in FIG. 23 are pixels obtained by continuous dynamic programming. Further, when the black circle pixel shown in the frame image is at the position of x (i), for example, the dynamic parallax accumulated from x (1) = 1 to x (i) is x (i) − It corresponds to x (1).

　このライン対画像による連続動的計画法のマッチング処理は、フレーム画像およびスライス画像に対するmean-shift法の適用前（領域分割処理の前）に行われる。従って、領域分割を行わずに、図２３において黒丸で示したフレーム画像の画素（累積された動的視差に該当する画素）を利用して各画素の距離を求めることにより、多段的な処理を行うことなく、各画素の距離を求めることができる。 The matching process of the continuous dynamic programming method by the line pair image is performed before the mean-shift method is applied to the frame image and the slice image (before the region division process). Accordingly, multi-stage processing is performed by determining the distance between each pixel using the pixels of the frame image indicated by black circles in FIG. 23 (pixels corresponding to the accumulated dynamic parallax) without performing region division. The distance of each pixel can be obtained without performing it.

　まず、分割領域の対応関係を考えずに、フレーム画像のｙ軸上の点をｙ′に固定して、このｙ＝ｙ′に該当するｘ軸上の累積された動的視差の画素（黒丸）を考える。このフレーム画像のｘ軸上の累積された動的視差に該当する画素（スライス画像にマッチングした画素）を、ｘ（１），ｘ（２），・・・，ｘ（ｉ－１），ｘ（ｉ），ｘ（ｉ＋１），・・・，ｘ（Ｔ）とする。累積された動的視差に該当する画素の数は、スライス画像の横軸（ｔ軸）の画素の数に対応するため、Ｔ箇所存在する。また、画素ｘ（ｉ）における撮影対象物からカメラ２００までのある距離をあるウィンドウ・サイズをもつメディアンフィルタの結果として、ｚｖ（ｉ，ｘ）とする。但し、ｉは上述したようにｉ＝１，２，・・・，Ｔである。距離ｚｖ（ｉ，ｘ）は、後述するようにあるウィンドウ・サイズをもつメディアンフィルタを経由して、ｘ（ｉ）における累積された動的視差により求められるため、上述した式１０の距離ｚｖ（ｔ，ｘ）と同様に、仮想距離と考えることができる。 First, without considering the correspondence of the divided areas, a point on the y-axis of the frame image is fixed to y ′, and the accumulated dynamic parallax pixels (black circles) on the x-axis corresponding to y = y ′. )think of. Pixels corresponding to the accumulated dynamic parallax on the x-axis of this frame image (pixels matched with the slice image) are x (1), x (2),..., X (i−1), x (I), x (i + 1),..., X (T). Since the number of pixels corresponding to the accumulated dynamic parallax corresponds to the number of pixels on the horizontal axis (t-axis) of the slice image, there are T locations. Further, a certain distance from the photographing object to the camera 200 at the pixel x (i) is set as zv (i, x) as a result of the median filter having a certain window size. However, i is i = 1, 2,..., T as described above. Since the distance zv (i, x) is obtained by the accumulated dynamic parallax in x (i) via a median filter having a certain window size as will be described later, the distance zv ( Similar to t, x), it can be considered as a virtual distance.

　画素ｘ（ｉ）における累積された動的視差をα（ｉ）とする。α（ｉ）は、ｘ（ｉ）までの近接する２つの画素点間の画素長（２点間の画素の距離差）が累積されたものである。ここで、ｘ（ｉ）からｘ（ｉ＋Ｋ）までの累積された動的視差は、近接する画素との距離差（視差）の累積であり、ｘ（ｉ＋１）－ｘ（ｉ）と、ｘ（ｉ＋２）－ｘ（ｉ＋１）と、・・・、ｘ（ｉ＋Ｋ）－ｘ（ｉ＋Ｋ－１）とを足し合わせたものと考えることができる。この画素長（２点間の画素差、近接する画素との距離差）の値は、画素間毎に異なる値になる。 Let α (i) be the dynamic parallax accumulated in pixel x (i). α (i) is an accumulation of pixel lengths between two adjacent pixel points up to x (i) (pixel distance difference between the two points). Here, the accumulated dynamic parallax from x (i) to x (i + K) is the accumulation of the distance difference (parallax) from adjacent pixels, and x (i + 1) −x (i) and x ( It can be considered that i + 2) −x (i + 1) and..., x (i + K) −x (i + K−1) are added. The value of the pixel length (pixel difference between two points, distance difference with adjacent pixels) is different for each pixel.

　ここで、このＫ個の画素間の画素長（距離差）を考慮し、メディアンフィルタを用いて、画素間の画素長の中央値を求める。画素ｘ（ｉ）に基づいて求められるＫ個の画素長の値に対して、メディアンフィルタを適用することにより求められる中央値を、Ｍｅｄ（ｉ）と示す。Ｍｅｄ（ｉ）は、ｘ（ｉ＋１）－ｘ（ｉ）の値と、ｘ（ｉ＋２）－ｘ（ｉ＋１）の値と、・・・，ｘ（ｉ＋Ｋ）－ｘ（ｉ＋Ｋ－１）の値との中央値を示している。 Here, considering the pixel length (distance difference) between the K pixels, a median filter is used to obtain the median pixel length between the pixels. The median value obtained by applying the median filter to the K pixel length values obtained based on the pixel x (i) is denoted as Med (i). Med (i) is a value of x (i + 1) −x (i), a value of x (i + 2) −x (i + 1),..., X (i + K) −x (i + K−1) The median of.

　例えば、一例として、ｘ（ｉ）を基準とする５つの画素（累積された動的視差）を、ｘ（ｉ＋１），ｘ（ｉ＋２），ｘ（ｉ＋３），ｘ（ｉ＋４），ｘ（ｉ＋５）として考える。これらの５つの距離差（差分量：動的視差）は、ｘ（ｉ＋１）－ｘ（ｉ），ｘ（ｉ＋２）－ｘ（ｉ＋１），ｘ（ｉ＋３）－ｘ（ｉ＋２），ｘ（ｉ＋４）－ｘ（ｉ＋３），ｘ（ｉ＋５）－ｘ（ｉ＋４）となる。これらの５つの距離差を比較し、距離差の大きい方から３番目の値がＭｅｄ（ｉ）となる。このようにして求められた値は、窓を５とするメディアンフィルタの出力値となる。 For example, as an example, five pixels (accumulated dynamic parallax) based on x (i) are expressed as x (i + 1), x (i + 2), x (i + 3), x (i + 4), and x (i + 5). Think of it as These five distance differences (difference amount: dynamic parallax) are x (i + 1) −x (i), x (i + 2) −x (i + 1), x (i + 3) −x (i + 2), x (i + 4) −x (i + 3), x (i + 5) −x (i + 4). These five distance differences are compared, and the third value from the largest distance difference is Med (i). The value obtained in this way is the output value of the median filter with a window of 5.

　このように、Ｍｅｄ（ｉ）を用いることによって、ｘ（ｉ）からｘ（ｉ＋Ｋ）までの、累積された動的視差α（ｉ）を、
　　　α（ｉ）＝Ｍｅｄ（ｉ）・Ｋ　　　　　　　　　・・・式１４
と表すことができる。 Thus, by using Med (i), the accumulated dynamic parallax α (i) from x (i) to x (i + K) is
α (i) = Med (i) · K Equation 14
It can be expressed as.

　一方で、累積された動的視差の微少な増加量をΔα（ｉ）とすると、Δα（ｉ）は、
　　　Δα（ｉ）＝α（ｉ＋Δｉ）－α（ｉ）
　と表すことができる。 On the other hand, if the accumulated small amount of dynamic parallax is Δα (i), Δα (i) is
Δα (i) = α (i + Δi) −α (i)
It can be expressed as.

　この累積された動的視差α（ｉ）と、画素ｘ（ｉ）における詳細な距離ｚｖ（ｉ，ｘ）との関係は、累積された動的視差の微少な増加量をΔα（ｉ）と、累積された動的視差の微少な増加量Δα（ｉ）に伴う距離の変化量－Δｚｖ（ｉ，ｘ）との関係で示すことができる。既に説明したように、累積された動的視差の特性から、以下の対応関係が成立する。 The relationship between the accumulated dynamic parallax α (i) and the detailed distance zv (i, x) at the pixel x (i) is expressed as Δα (i), where Δc (i) represents a slight increase in the accumulated dynamic parallax. The relationship can be shown by the relationship with the amount of change in distance -Δzv (i, x) associated with the slight increase amount Δα (i) of the accumulated dynamic parallax. As described above, the following correspondence relationship is established from the accumulated dynamic parallax characteristics.

　　　ｚｖ（ｉ，ｘ）：α（ｉ）＝－Δｚｖ（ｉ，ｘ）：Δα（ｉ）
　α（ｉ）＝１の場合には、
　　　ｚｖ（ｉ，ｘ）：１＝－Δｚｖ（ｉ，ｘ）：Δα（ｉ）
　と表すことができる。 zv (i, x): α (i) = − Δzv (i, x): Δα (i)
If α (i) = 1,
zv (i, x): 1 = −Δzv (i, x): Δα (i)
It can be expressed as.

　この対応関係に基づいて、下記の関係式を解くことによって、
　　　－Δｚｖ（ｉ，ｘ）＝ｚｖ（ｉ，ｘ）・Δα（ｉ）
　　　Δｚｖ（ｉ，ｘ）／ｚｖ（ｉ，ｘ）＝－Δα（ｉ）
　　　ｌｏｇｚｖ（ｉ，ｘ）＝－Δα（ｉ）＋ｃとなり、
この関係式を変形して、距離ｚｖ（ｉ，ｘ）を、
　　　ｚｖ（ｉ，ｘ）＝ａ・ｅｘｐ（－ｂ・α（ｉ）），ａ＞０，ｂ＞０
　として求めることができる。 Based on this correspondence, by solving the following relational expression,
-Δzv (i, x) = zv (i, x) · Δα (i)
Δzv (i, x) / zv (i, x) = − Δα (i)
log zv (i, x) = − Δα (i) + c,
By transforming this relational expression, the distance zv (i, x) is
zv (i, x) = a · exp (−b · α (i)), a> 0, b> 0
Can be obtained as

　ここで、α（ｉ）は、上述したメディアンフィルタによる出力値Ｍｅｄ（ｉ）を用いて、
　　　α（ｉ）＝Ｍｅｄ（ｉ）・Ｋ
　と示すことができる。このため、ｘ（ｉ）における距離ｚｖ（ｉ，ｘ）は、
　　ｚｖ（ｉ，ｘ）＝ａ・ｅｘｐ（－ｂ・Ｍｅｄ（ｉ）・Ｋ）
　　　　　　　　　　　　　　　　　　　　　　　　　　　　・・・式１５
　と示すことができる。 Here, α (i) is obtained by using the output value Med (i) by the median filter described above.
α (i) = Med (i) · K
Can be shown. Therefore, the distance zv (i, x) at x (i) is
zv (i, x) = a · exp (−b · Med (i) · K)
... Formula 15
Can be shown.

　ここで、係数ａおよび係数ｂの値は、既に説明した考え方によって求めることができる。 Here, the values of the coefficient a and the coefficient b can be obtained by the already explained concept.

　係数ａと係数ｂとを求めることによって、動的視差の中央値（Ｍｅｄ（ｉ））に基づいて、フレーム画像のｘ（ｉ）における詳細な距離を、実距離関数を用いて求めることができる。実距離関数によって求められるｘ（ｉ）の距離をｚ（ｉ，ｘ）とすると、実距離関数は、
　　ｚ（ｉ，ｘ）＝ａ・ｅｘｐ（－ｂ・Ｍｅｄ（ｉ）・Ｋ）
　　　　　　　　　　　　　　　　　　　　　　　　　　　・・・式１６
　と示すことができる。 By obtaining the coefficient a and the coefficient b, the detailed distance at x (i) of the frame image can be obtained using the real distance function based on the median value of dynamic parallax (Med (i)). . When the distance of x (i) obtained by the real distance function is z (i, x), the real distance function is
z (i, x) = a · exp (−b · Med (i) · K)
... Formula 16
Can be shown.

　具体的には、累積された動的視差Ｍｅｄ（ｉ）・Ｋの設定範囲を、定数μ_２および定数γ_２を用いて、μ_２≦Ｍｅｄ（ｉ）・Ｋ≦γ_２とし、
　画素ｘ（ｉ）における実距離関数の距離ｚ（ｉ，ｘ）の設定範囲を、定数ｚ_Ｎ２および定数ｚ_Ｌ２を用いて、ｚ_Ｎ２≦ｚ（ｉ，ｘ）≦ｚ_Ｌ２とすることによって、
　係数ａを、
　ａ＝ｚ_Ｌ２・ｅｘｐ（（μ_２／（γ_２－μ_２））ｌｏｇ（ｚ_Ｌ２／ｚ_Ｎ２）
により算出し、
　係数ｂを、
　ｂ＝（１／（γ_２－μ_２））ｌｏｇ（ｚ_Ｌ２／ｚ_Ｎ２）
により算出する。 Specifically, the setting range of the accumulated dynamic parallax Med (i) · K is set to μ ₂ ≦ Med (i) · K ≦ γ ₂ using the constant μ ₂ and the constant γ ₂ .
By setting the setting range of the distance z (i, x) of the real distance function in the pixel x (i) to z _N2 ≦ z (i, x) ≦ z _L2 using the constant z _N2 and the constant z _L2 ,
The coefficient a is
a = z _L2 · exp ((μ ₂ / (γ ₂ −μ ₂ )) log (z _L2 / z _N2 )
Calculated by
The coefficient b is
b = (1 / (γ ₂ −μ ₂ )) log (z _L2 / z _N2 )
Calculated by

　そして、求められた係数ａと係数ｂとを用いて、上述した式１６の実距離関数により、距離ｚ（ｉ，ｘ）を求めることによって、フレーム画像の画素ｘ（ｉ）における、撮影対象物からカメラ２００までの詳細な距離を求めることができる。従って、実施の形態において説明したように、mean-shift法による領域分割を用いることなく、フレーム画像の画素ｘ（ｉ）における詳細な距離を求めることができる。 Then, by using the obtained coefficient a and coefficient b and obtaining the distance z (i, x) by the above-described real distance function of Expression 16, the object to be imaged at the pixel x (i) of the frame image is obtained. The detailed distance from the camera 200 to the camera 200 can be obtained. Therefore, as described in the embodiment, the detailed distance at the pixel x (i) of the frame image can be obtained without using region division by the mean-shift method.

　ここで、画素ｘ（ｉ）は、フレーム画像の１つの画素点であるが、累積された動的視差に該当する画素点であるため、フレーム画像にはＴ個しか存在しない。従って、距離ｚ（ｉ，ｘ）を求めることができる画素点の数も、Ｔ個しか存在せず、フレーム画像の全ての画素で距離を求めることは難しい。しかしながら、画素ｘ（ｉ）は、撮影対象物の表面位置を決定するテクスチャの画素に該当する。このため、テクスチャとなる画素ｘ（ｉ）の距離を利用することによって、距離が求められていない画素の距離値を、距離値が決まった周辺の画素の距離値を用いて、内挿（補間）により求めることができる。内挿および補間とは、一般的に、ある既知の数値データ列を基にして、そのデータ列の各区間の範囲内を埋める数値を求めること、またはそのような関数を与えることを意味する。 Here, although the pixel x (i) is one pixel point of the frame image, since it is a pixel point corresponding to the accumulated dynamic parallax, there are only T pixels in the frame image. Accordingly, there are only T pixel points from which the distance z (i, x) can be obtained, and it is difficult to obtain the distance for all the pixels of the frame image. However, the pixel x (i) corresponds to a texture pixel that determines the surface position of the object to be imaged. Therefore, by using the distance of the texture pixel x (i), the distance value of the pixel for which the distance is not obtained is interpolated (interpolated) using the distance values of the surrounding pixels for which the distance value is determined. ). Interpolation and interpolation generally mean obtaining a numerical value that fills the range of each section of a data string based on a certain known numerical data string, or giving such a function.

　従って、フレーム画像の画素ｘ（ｉ）における距離値を用いて、距離値が設定されていない画素の距離を内挿（補間）することによって、領域分割を行うことなく、つまり、領域を考慮して多段的に距離値を求めるのではなく、１回の処理によって、フレーム画像の各画素の詳細な距離を求めることが可能になる。 Therefore, by using the distance value at the pixel x (i) of the frame image, the distance of the pixel for which the distance value is not set is interpolated (interpolated), that is, the area is taken into consideration. Thus, it is possible to obtain the detailed distance of each pixel of the frame image by one process instead of obtaining the distance value in multiple stages.

　なお、領域を抽出する処理および領域を対応させる処理を行うことによって各画素の距離値を計算する場合には、内挿によって各画素の距離を求める場合よりも、より安定した情報として領域毎の距離情報を利用できる場合もある。このため、対象となる動画映像によっては、画素毎の距離値を直接計算するよりも、領域を抽出する処理および領域を対応させる処理を行って領域毎の距離情報を求めてから、画素毎の距離値を計算する方が、求められる距離値の信頼性を高められる場合もある。従って、実際に画素毎の距離値を求める場合には、領域毎の距離を算出する方法と、メディアンフィルタを用いて直接画素毎の距離を算出する方法との双方の処理方法を、適宜使い分けることが望ましい。どちらの処理方法を用いた方がより正確なものになるかは、実際の適用対象によって異なると考えられる。 In addition, when calculating the distance value of each pixel by performing the process of extracting the area and the process of associating the area, it is more stable information for each area than when the distance of each pixel is obtained by interpolation. Sometimes distance information is available. For this reason, depending on the target video image, rather than directly calculating the distance value for each pixel, the processing for extracting the region and the processing for associating the region are performed to obtain the distance information for each region. The calculation of the distance value may improve the reliability of the required distance value. Therefore, when the distance value for each pixel is actually obtained, the processing method for both the method for calculating the distance for each region and the method for directly calculating the distance for each pixel using a median filter are properly used. Is desirable. Which processing method is more accurate will depend on the actual application target.

１００  …画像距離算出装置
１０１  …記録部（画素情報記録部）
１０２  …ＲＯＭ
１０３  …ＲＡＭ（画素情報記録部）
１０４  …ＣＰＵ（フレーム画像抽出部、スライス画像生成部、スポッティング点算出部、画素マッチング部、領域分割部、対応領域決定部、グローバル距離算出部、ローカル距離算出部、詳細距離算出部、制御部、コード検出部、画素距離値抽出部、コードＲＧＢ値割当部、ＲＧＢ値入替部、貼り合わせ画像生成部、ＲＧＢ値検出部、距離情報付加部、ＲＧＢ値変更部、修正貼り合わせ画像生成部、距離付加貼り合わせ画像生成部）
２００  …カメラ
２１０  …モニタ DESCRIPTION OF SYMBOLS 100 ... Image distance calculation apparatus 101 ... Recording part (Pixel information recording part)
102 ROM
103 ... RAM (pixel information recording unit)
104 ... CPU (frame image extraction unit, slice image generation unit, spotting point calculation unit, pixel matching unit, region division unit, corresponding region determination unit, global distance calculation unit, local distance calculation unit, detailed distance calculation unit, control unit, Code detection unit, pixel distance value extraction unit, code RGB value assignment unit, RGB value replacement unit, composite image generation unit, RGB value detection unit, distance information addition unit, RGB value change unit, modified composite image generation unit, distance Additional bonded image generator)
200 ... camera 210 ... monitor

Claims

A computer readable recording of an image distance calculation program of an image distance calculation device that calculates a distance from a camera to a shooting object recorded in the moving image based on a moving image captured by one moving camera A non-transitory recording medium,
In the control unit of the image distance calculation device,
A frame image extraction function for extracting a frame image at an arbitrary time of the video image;
In the frame image, an axis extending in the moving direction of the camera is an x-axis, an axis orthogonal to the x-axis is a y-axis, and a pixel column time on the y-axis at the x0 point of the x-axis A slice image generation function for generating a slice image having the vertical axis as the y-axis and the horizontal axis as the t-axis (1 ≦ t ≦ T) by extracting the change from time t0 + 1 to time t0 + T;
The pixel of the slice image at time t (1 ≦ t ≦ T) is g (t, y), and xyt at time t0 at the y ′ point (1 ≦ y ′ ≦ Y) on the y-axis of the frame image. A frame corresponding to a pixel g (t, y) of a slice image existing at an arbitrary point in an interval [1, X] of x, where a pixel in space is f (x, y ′, t0) = r (x). By obtaining a pixel r (x) point of the image using a matching process based on dynamic programming, the coordinates of the pixel of the frame image corresponding to the pixel at time T in the slice image are calculated as a spotting point. Spotting point calculation function,
Based on the spotting points calculated by the spotting point calculation function, by performing backtrace processing from time t = T to time t = 1, each from t = 1 to t = T on the t-axis of the slice image A pixel matching function for obtaining a correspondence relationship between the pixels of the frame image corresponding to the pixels of
A region division function that performs region division of each image based on a common division criterion by applying a mean-shift method to each of the frame image and the slice image;
Based on the pixels existing in the divided region of the slice image divided by the region dividing function, the pixel of the frame image corresponding to the pixel of the slice image obtained by the pixel matching function is detected and detected. A corresponding area determination function for determining a divided area of the frame image corresponding to a divided area of the slice image as a corresponding area by obtaining a divided area of the frame image including the most pixels of the frame image;
In the corresponding region of the frame image determined by the corresponding region determination function, an average q of the number of pixels in the x-axis direction is detected, and in the corresponding divided region of the slice image, the number of pixels in the t-axis direction is detected. By detecting the average p, the ratio value obtained based on the ratio of q to p or the ratio of p to q is calculated for each corresponding region, and from the camera to the object to be photographed shown in the frame image A global distance calculation function for calculating the distance corresponding to the calculated ratio value as a global distance for each of the corresponding areas by using a distance function in which a correspondence relationship between the distance and the ratio value is predetermined. ,
The computer-readable non-transitory recording medium which recorded the program for image distance calculation for implement | achieving.

In the control unit,
Relative to two frame images taken at different times by the camera and partially including common image portions, RGB of all pixels of the two frame images A code detection function for detecting RGB values not corresponding to the extracted RGB values as RGB values of the code by extracting the values of
A pixel distance value extraction function for extracting the distance value of the pixel for which the global distance has been calculated by the global distance calculation function from the pixels of the two frame images;
A code RGB value assignment function for assigning the RGB values of the code so as not to overlap each distance value extracted by the pixel distance value extraction function;
The RGB values of the pixels of the two frame images having the same distance value as the distance value to which the RGB value of the code is assigned by the code RGB value assignment function are assigned according to the distance value. RGB value replacement function for replacing the RGB value of the code;
A pixel information recording function for recording the RGB values after being replaced by the RGB value replacement function in association with the distance values of the pixels that have been replaced with the RGB values;
By applying a stitching algorithm to the two frame images in which the RGB values of the pixels have been replaced by the RGB value replacement function, the two frame images are bonded to each other. A composite image generation function for generating a composite image of
An RGB value that matches or approximates the RGB value recorded by the pixel information recording function is detected from the RGB values of all the pixels of the composite image generated by the composite image generation function. RGB value detection function;
A distance for adding the distance value associated with the RGB value recorded by the pixel information recording function to the pixel having the RGB value detected by the RGB value detection function as distance information of the pixel. Information addition function,
An RGB value changing function for changing the RGB value of the pixel to which the distance information is added by the distance information adding function to an average value of RGB values of pixels around the pixel;
A computer-readable non-transitory recording medium in which the image distance calculation program according to claim 1 is recorded.

In the control unit,
Pixels from the boundary of the start end in the t-axis direction to the end boundary in the divided region of the slice image, and the boundary of the start end in the x-axis direction in the corresponding region of the frame image corresponding to the divided region of the slice image The pixel corresponding to each pixel in the divided region of the slice image is obtained by using the matching processing and the back trace processing fixed at both end points based on the dynamic programming to obtain the correspondence with the pixel from the boundary to the end boundary. Is determined as a pixel in the corresponding region of the frame image, and based on the interval of the pixels in the x-axis direction in the corresponding region of the frame image, relative to each pixel in the corresponding region Local distance calculation function to calculate the correct distance as local distance,
By adding the global distance for each corresponding region of the frame image calculated by the global distance calculation function to the local distance for each pixel of the frame image calculated by the local distance calculation function, the camera A detailed distance calculation function for calculating a detailed distance from the shooting object to each pixel of the frame image;
A computer-readable non-transitory recording medium in which the image distance calculation program according to claim 1 is recorded.

In the control unit,
Two frame images taken at different times by the camera, including part of the common image part, and calculating the distance from the subject to the camera for each pixel in the detailed distance calculation function By extracting the RGB values of all the pixels of the two frame images from the two frame images thus obtained, the RGB values not corresponding to the extracted RGB values are converted into the RGB of the code. Code detection function to detect as the value of
A pixel distance value extraction function for randomly selecting 1 / N (N is a positive number) of the total number of pixels of the two frame images and extracting a distance value of the selected pixels;
A code RGB value assignment function for assigning the RGB values of the code so as not to overlap each distance value extracted by the pixel distance value extraction function;
The RGB values of the pixels of the two frame images having the same distance value as the distance value to which the RGB value of the code is assigned by the code RGB value assignment function are assigned according to the distance value. RGB value replacement function for replacing the RGB value of the code;
A pixel information recording function for recording the RGB values after being replaced by the RGB value replacement function in association with the distance values of the pixels that have been replaced with the RGB values;
By applying a stitching algorithm to the two frame images in which the RGB values of the pixels have been replaced by the RGB value replacement function, the two frame images are bonded to each other. A composite image generation function for generating a composite image of
An RGB value that matches or approximates the RGB value recorded by the pixel information recording function is detected from among the RGB values of all the pixels of the composite image generated by the composite image generation function. RGB value detection function;
The distance value associated with the RGB value recorded by the pixel information recording function is added to the pixel having the RGB value detected by the RGB value detection function as distance information of the pixel. Distance information addition function,
A modified paste in which the RGB value is corrected by changing the RGB value of the pixel to which the distance information has been added by the distance information adding function to the average value of the RGB values of pixels around the pixel. A modified composite image generation function for generating a combined image;
A distance-added composite image generation function for generating one composite image in which the distance information is added to all the pixels based on N modified composite images generated by the correction composite image generation function; Have
In the pixel distance value extraction function, when the control unit extracts pixel distance values for the second and subsequent times, the total number of pixels from the pixels that have not been selected in the past from the two frame images. 1 / N number of pixels are randomly selected, and the distance value of the pixel is extracted.
The code RGB value assignment function, the RGB value replacement function, the pixel information recording function, the composite image generation function, the RGB value detection function, the distance information addition function, and the corrected composite image generation In relation to the function, N corrections are made by causing the control unit to repeatedly execute each function N times in order based on the distance value selected at the second time or later by the pixel distance value extraction function. Generate a stitched image,
In the distance-added composite image generation function, the distance information added to 1 / N pixels of the total number of pixels of the corrected composite image is read by overlapping all of the N corrected composite images. By causing the control unit to obtain distance information of all the pixels in the corrected composite image, and adding the obtained distance information to a single composite image, all the pixels have the distance. A non-transitory computer-readable recording medium having recorded thereon the image distance calculation program according to claim 3 for realizing generation of the one combined image to which information is added.

In the control unit,
Pixels from the boundary of the start end in the t-axis direction to the end boundary in the divided region of the slice image, and the boundary of the start end in the x-axis direction in the corresponding region of the frame image corresponding to the divided region of the slice image Corresponding to each pixel in the divided region of the slice image by determining the correspondence with the pixels from the boundary to the end boundary using matching processing and fixed back-end processing based on dynamic programming. X (1), x (2),..., X (G-1), x (G-1), x (G-1), x (G-1), x (G-1) ) (1 ≦ i ≦ G)
The average number of pixels from the boundary of the start end in the x-axis direction to the boundary of the end in the corresponding region of the frame image is xa,
The distance between the pixel x (i) in the corresponding region of the frame image and the pixel x (i−1) close to the pixel x (i) obtained by the backtrace process is expressed as x (i) − x (i-1)
The global distance of the corresponding area calculated by the global distance calculation function is a distance zg,
The detailed distance from the imaging object to the camera at the pixel x (i) of the frame image is the distance z (i), and the distance z (i) is set using a positive constant β.
z (i) = zg + β (x (i) −x (i−1) −xa / G)
A computer-readable non-transitory recording medium that records the image distance calculation program according to claim 1 for realizing a detailed distance calculation function calculated by.

In the control unit,
Two frame images taken at different times by the camera, including part of the common image part, and calculating the distance from the subject to the camera for each pixel in the detailed distance calculation function By extracting the RGB values of all the pixels of the two frame images from the two frame images thus obtained, the RGB values not corresponding to the extracted RGB values are converted into the RGB of the code. Code detection function to detect as the value of
A pixel distance value extraction function for randomly selecting 1 / N (N is a positive number) of the total number of pixels of the two frame images and extracting a distance value of the selected pixels;
A code RGB value assignment function for assigning the RGB values of the code so as not to overlap each distance value extracted by the pixel distance value extraction function;
The RGB values of the pixels of the two frame images having the same distance value as the distance value to which the RGB value of the code is assigned by the code RGB value assignment function are assigned according to the distance value. RGB value replacement function for replacing the RGB value of the code;
A pixel information recording function for recording the RGB values after being replaced by the RGB value replacement function in association with the distance values of the pixels that have been replaced with the RGB values;
By applying a stitching algorithm to the two frame images in which the RGB values of the pixels have been replaced by the RGB value replacement function, the two frame images are bonded to each other. A composite image generation function for generating a composite image of
An RGB value that matches or approximates the RGB value recorded by the pixel information recording function is detected from among the RGB values of all the pixels of the composite image generated by the composite image generation function. RGB value detection function;
The distance value associated with the RGB value recorded by the pixel information recording function is added to the pixel having the RGB value detected by the RGB value detection function as distance information of the pixel. Distance information addition function,
A modified paste in which the RGB value is corrected by changing the RGB value of the pixel to which the distance information has been added by the distance information adding function to the average value of the RGB values of pixels around the pixel. A modified composite image generation function for generating a combined image;
A distance-added composite image generation function for generating one composite image in which the distance information is added to all the pixels based on N modified composite images generated by the correction composite image generation function; Have
In the pixel distance value extraction function, when the control unit extracts pixel distance values for the second and subsequent times, the total number of pixels from the pixels that have not been selected in the past from the two frame images. 1 / N number of pixels are randomly selected, and the distance value of the pixel is extracted.
The code RGB value assignment function, the RGB value replacement function, the pixel information recording function, the composite image generation function, the RGB value detection function, the distance information addition function, and the corrected composite image generation In relation to the function, N corrections are made by causing the control unit to repeatedly execute each function N times in order based on the distance value selected at the second time or later by the pixel distance value extraction function. Generate a stitched image,
In the distance-added composite image generation function, the distance information added to 1 / N pixels of the total number of pixels of the corrected composite image is read by overlapping all of the N corrected composite images. By causing the control unit to obtain distance information of all the pixels in the corrected composite image, and adding the obtained distance information to a single composite image, all the pixels have the distance. A non-transitory computer-readable recording medium on which the image distance calculation program according to claim 5 is recorded for realizing generation of the one combined image to which information is added.

A computer readable recording of an image distance calculation program of an image distance calculation device that calculates a distance from a camera to a shooting object recorded in the moving image based on a moving image captured by one moving camera A non-transitory recording medium,
In the control unit of the image distance calculation device,
A frame image extraction function for extracting a frame image at an arbitrary time of the video image;
In the frame image, an axis extending in the moving direction of the camera is an x-axis, an axis orthogonal to the x-axis is a y-axis, and a pixel column time on the y-axis at the x0 point of the x-axis A slice image generation function for generating a slice image having the vertical axis as the y-axis and the horizontal axis as the t-axis (1 ≦ t ≦ T) by extracting the change from time t0 + 1 to time t0 + T;
The pixel of the slice image at time t (1 ≦ t ≦ T) is g (t, y), and xyt at time t0 at the y ′ point (1 ≦ y ′ ≦ Y) on the y-axis of the frame image. A frame corresponding to a pixel g (t, y) of a slice image existing at an arbitrary point in an interval [1, X] of x, where a pixel in space is f (x, y ′, t0) = r (x). By obtaining a pixel r (x) point of the image using a matching process based on dynamic programming, the coordinates of the pixel of the frame image corresponding to the pixel at time T in the slice image are calculated as a spotting point. Spotting point calculation function,
Based on the spotting points calculated by the spotting point calculation function, by performing backtrace processing from time t = T to time t = 1, each from t = 1 to t = T on the t-axis of the slice image A pixel matching function for obtaining a correspondence relationship between the pixels of the frame image corresponding to the pixels of
The x-axis direction pixel of the frame image at time t obtained by the pixel matching function is x (t), and the x-axis direction pixel of the frame image at time t0 is x (t0). The distance between the two pixels obtained by subtracting the pixel x (t0) from the pixel x (t) in the image is the accumulated dynamic parallax α (t, t0),
The global distance zg is the distance from the object to be photographed at the pixel x (t) of the frame image to the camera.
A set range of the accumulated dynamic parallax α (t, t0) is set to μ ₁ ≦ α (t, t0) ≦ γ ₁ using a constant μ ₁ and a constant γ ₁ .
The setting range of the global distance zg is set as z _N1 ≦ zg ≦ z _L1 using a constant z _N1 and a constant z _L1 .
The coefficient a is
a = z _L1 · exp ((μ ₁ / (γ ₁ −μ ₁ )) log (z _L1 / z _N1 )
Calculated by
The coefficient b is b = (1 / (γ ₁ −μ ₁ )) log (z _L1 / z _N1 )
The global distance zg at the pixel x (t) is calculated using the accumulated dynamic parallax α (t, t0), the coefficient a, and the coefficient b.
zg = a · exp (−b · α (t, t0))
A global distance calculation function to calculate with
The computer-readable non-transitory recording medium which recorded the program for image distance calculation for implement | achieving.

In the control unit,
Relative to two frame images taken at different times by the camera and partially including common image portions, RGB of all pixels of the two frame images A code detection function for detecting RGB values not corresponding to the extracted RGB values as RGB values of the code by extracting the values of
A pixel distance value extraction function for extracting the distance value of the pixel for which the global distance has been calculated by the global distance calculation function from the pixels of the two frame images;
A code RGB value assignment function for assigning the RGB values of the code so as not to overlap each distance value extracted by the pixel distance value extraction function;
The RGB values of the pixels of the two frame images having the same distance value as the distance value to which the RGB value of the code is assigned by the code RGB value assignment function are assigned according to the distance value. RGB value replacement function for replacing the RGB value of the code;
A pixel information recording function for recording the RGB values after being replaced by the RGB value replacement function in association with the distance values of the pixels that have been replaced with the RGB values;
By applying a stitching algorithm to the two frame images in which the RGB values of the pixels have been replaced by the RGB value replacement function, the two frame images are bonded to each other. A composite image generation function for generating a composite image of
An RGB value that matches or approximates the RGB value recorded by the pixel information recording function is detected from the RGB values of all the pixels of the composite image generated by the composite image generation function. RGB value detection function;
A distance for adding the distance value associated with the RGB value recorded by the pixel information recording function to the pixel having the RGB value detected by the RGB value detection function as distance information of the pixel. Information addition function,
An RGB value changing function for changing the RGB value of the pixel to which the distance information is added by the distance information adding function to an average value of RGB values of pixels around the pixel;
A non-transitory computer-readable recording medium on which the image distance calculation program according to claim 7 is recorded.

A computer readable recording of an image distance calculation program of an image distance calculation device that calculates a distance from a camera to a shooting object recorded in the moving image based on a moving image captured by one moving camera A non-transitory recording medium,
In the control unit of the image distance calculation device,
A frame image extraction function for extracting a frame image at an arbitrary time of the video image;
In the frame image, an axis extending in the moving direction of the camera is an x-axis, an axis orthogonal to the x-axis is a y-axis, and a pixel column time on the y-axis at the x0 point of the x-axis A slice image generation function for generating a slice image having the vertical axis as the y-axis and the horizontal axis as the t-axis (1 ≦ t ≦ T) by extracting the change from time t0 + 1 to time t0 + T;
The pixel of the slice image at time t (1 ≦ t ≦ T) is g (t, y), and xyt at time t0 at the y ′ point (1 ≦ y ′ ≦ Y) on the y-axis of the frame image. A frame corresponding to a pixel g (t, y) of a slice image existing at an arbitrary point in an interval [1, X] of x, where a pixel in space is f (x, y ′, t0) = r (x). By obtaining a pixel r (x) point of the image using a matching process based on dynamic programming, the coordinates of the pixel of the frame image corresponding to the pixel at time T in the slice image are calculated as a spotting point. Spotting point calculation function,
Based on the spotting points calculated by the spotting point calculation function, backtrace processing is performed from time t = T to time t = 1, so that t = 1 to t = T on the t-axis of the slice image. X (1), x (2), x (3),..., X (i),... X (T) ( A pixel matching function to be obtained as 1 ≦ i ≦ T),
The distance difference between the pixel x (i) of the frame image obtained by the pixel matching function and the adjacent pixel x (i−1) is x (i) −x (i−1), and the pixel x (i ) From the adjacent K pixels (where K <T) to the adjacent pixels, x (i + 1) −x (i), x (i + 2) −x (i + 1), x (i + 3) − x (i + 2),..., x (i + K-1) −x (i + K−2), x (i + K) −x (i + K−1), and the obtained distance difference value between the pixels By determining the median as Med (i), the accumulated dynamic parallax at pixel x (i) is Med (i) · K,
A detailed distance from the photographing object to the camera at the pixel x (i) of the frame image is a distance z (i, x),
The set range of the accumulated dynamic parallax Med (i) · K is set to μ ₂ ≦ Med (i) · K ≦ γ ₂ using the constant μ ₂ and the constant γ ₂ .
A setting range of the distance z (i, x) in the pixel x (i) is set as z _N2 ≦ z (i, x) ≦ z _L2 using the constant z _N2 and the constant z _L2 .
The coefficient a is
a = z _L2 · exp ((μ ₂ / (γ ₂ −μ ₂ )) log (z _L2 / z _N2 )
Calculated by
The coefficient b is b = (1 / (γ ₂ −μ ₂ )) log (z _L2 / z _N2 )
The distance z (i, x) in x (i) is calculated using the accumulated dynamic parallax Med (i) · K, the coefficient a, and the coefficient b.
z (i, x) = a · exp (−b · Med (i) · K)
Detailed distance calculation function to be calculated by,
The computer-readable non-transitory recording medium which recorded the program for image distance calculation for implement | achieving.

In the control unit,
In the detailed distance calculation function, the distance of the pixels other than the pixel x (i) (1 ≦ i ≦ T) of the frame image is the distance z (i, x) obtained in the pixel x (i). A non-transitory computer-readable recording medium on which the image distance calculation program according to claim 9 is recorded for realizing calculation by interpolation using a distance value.

In the control unit,
Two frame images taken at different times by the camera, including part of the common image part, and calculating the distance from the subject to the camera for each pixel in the detailed distance calculation function By extracting the RGB values of all the pixels of the two frame images from the two frame images thus obtained, the RGB values not corresponding to the extracted RGB values are converted into the RGB of the code. Code detection function to detect as the value of
A pixel distance value extraction function for randomly selecting 1 / N (N is a positive number) of the total number of pixels of the two frame images and extracting a distance value of the selected pixels;
A code RGB value assignment function for assigning the RGB values of the code so as not to overlap each distance value extracted by the pixel distance value extraction function;
The RGB values of the pixels of the two frame images having the same distance value as the distance value to which the RGB value of the code is assigned by the code RGB value assignment function are assigned according to the distance value. RGB value replacement function for replacing the RGB value of the code;
A pixel information recording function for recording the RGB values after being replaced by the RGB value replacement function in association with the distance values of the pixels that have been replaced with the RGB values;
By applying a stitching algorithm to the two frame images in which the RGB values of the pixels have been replaced by the RGB value replacement function, the two frame images are bonded to each other. A composite image generation function for generating a composite image of
An RGB value that matches or approximates the RGB value recorded by the pixel information recording function is detected from among the RGB values of all the pixels of the composite image generated by the composite image generation function. RGB value detection function;
The distance value associated with the RGB value recorded by the pixel information recording function is added to the pixel having the RGB value detected by the RGB value detection function as distance information of the pixel. Distance information addition function,
A modified paste in which the RGB value is corrected by changing the RGB value of the pixel to which the distance information has been added by the distance information adding function to the average value of the RGB values of pixels around the pixel. A modified composite image generation function for generating a combined image;
A distance-added composite image generation function for generating one composite image in which the distance information is added to all the pixels based on N modified composite images generated by the correction composite image generation function; Have
In the pixel distance value extraction function, when the control unit extracts pixel distance values for the second and subsequent times, the total number of pixels from the pixels that have not been selected in the past from the two frame images. 1 / N number of pixels are randomly selected, and the distance value of the pixel is extracted.
The code RGB value assignment function, the RGB value replacement function, the pixel information recording function, the composite image generation function, the RGB value detection function, the distance information addition function, and the corrected composite image generation In relation to the function, N corrections are made by causing the control unit to repeatedly execute each function N times in order based on the distance value selected at the second time or later by the pixel distance value extraction function. Generate a stitched image,
In the distance-added composite image generation function, the distance information added to 1 / N pixels of the total number of pixels of the corrected composite image is read by overlapping all of the N corrected composite images. By causing the control unit to obtain distance information of all the pixels in the corrected composite image, and adding the obtained distance information to a single composite image, all the pixels have the distance. A non-transitory computer-readable recording medium having recorded thereon the image distance calculation program according to claim 10 for realizing generation of the one combined image to which information is added.

A frame image extraction unit that extracts a frame image at an arbitrary time of the moving image based on the moving image captured by one moving camera;
In the frame image, an axis extending in the moving direction of the camera is an x-axis, an axis orthogonal to the x-axis is a y-axis, and a pixel column time on the y-axis at the x0 point of the x-axis A slice image generation unit that generates a slice image with the vertical axis representing the y-axis and the horizontal axis representing the t-axis (1 ≦ t ≦ T) by extracting the change from time t0 + 1 to time t0 + T;
The pixel of the slice image at time t (1 ≦ t ≦ T) is g (t, y), and xyt at time t0 at the y ′ point (1 ≦ y ′ ≦ Y) on the y-axis of the frame image. A frame corresponding to a pixel g (t, y) of a slice image existing at an arbitrary point in an interval [1, X] of x, where a pixel in space is f (x, y ′, t0) = r (x). By calculating the pixel r (x) point of the image using a matching process based on dynamic programming, the coordinates of the pixel of the frame image corresponding to the pixel at time T in the slice image are calculated as spotting points. A spotting point calculator,
Based on the spotting points calculated by the spotting point calculation unit, by performing backtrace processing from time t = T to time t = 1, each of t = 1 to t = T on the t-axis of the slice image A pixel matching unit for obtaining a correspondence relationship between the pixels of the frame image corresponding to the pixels of
By applying a mean-shift method to each of the frame image and the slice image, a region dividing unit that performs region division of each image based on a common division criterion;
Based on the pixels present in the divided region of the slice image divided by the region dividing unit, the pixel of the frame image corresponding to the pixel of the slice image obtained by the pixel matching unit is detected and detected. A corresponding area determining unit that determines a divided area of the frame image corresponding to the divided area of the slice image as a corresponding area by obtaining a divided area of the frame image including the most pixels of the frame image;
In the corresponding region of the frame image determined by the corresponding region determination unit, an average q of the number of pixels in the x-axis direction is detected, and in the corresponding divided region of the slice image, the number of pixels in the t-axis direction is detected. By detecting the average p, a ratio value obtained based on the ratio of q to p or the ratio of p to q is calculated for each of the corresponding regions, and from the camera to the object to be photographed shown in the frame image A distance calculation unit that calculates a distance corresponding to the calculated ratio value as a global distance for each of the corresponding regions by using a distance function in which a correspondence relationship between the distance and the ratio value is predetermined. ,
An image distance calculation device comprising:

Relative to the two frame images taken at different times by the camera and partially including a common image portion, RGB of all pixels of the two frame images A code detection unit that detects an RGB value not corresponding to the extracted RGB value as an RGB value of the code by extracting the value of
A pixel distance value extraction unit that extracts the distance value of the pixel for which the global distance is calculated by the global distance calculation unit from the pixels of the two frame images;
A code RGB value assigning unit that assigns the RGB values of the code so as not to overlap each distance value extracted by the pixel distance value extracting unit;
The RGB values of the pixels of the two frame images having the same distance value as the distance value to which the RGB value of the code is assigned by the code RGB value assigning unit are assigned according to the distance value. An RGB value replacement unit for replacing the RGB value of the code;
A pixel information recording unit that records the RGB values after being replaced by the RGB value replacement unit in association with the distance values of the pixels that have been replaced with the RGB values;
By applying a stitching algorithm to the two frame images in which the RGB values of the pixels have been replaced by the RGB value replacement unit, the two frame images are bonded to each other. A composite image generation unit for generating a composite image of
An RGB value that matches or approximates the RGB value recorded in the pixel information recording unit is detected from the RGB values of all the pixels of the combined image generated by the combined image generation unit. An RGB value detector;
For the pixel having the RGB value detected by the RGB value detection unit, the distance value associated with the RGB value recorded in the pixel information recording unit is added as distance information of the pixel. A distance information adding unit;
An RGB value changing unit that changes an RGB value of the pixel to which the distance information is added by the distance information adding unit to an average value of RGB values of pixels around the pixel;
The image distance calculation apparatus according to claim 12, comprising:

Pixels from the boundary of the start end in the t-axis direction to the end boundary in the divided region of the slice image, and the boundary of the start end in the x-axis direction in the corresponding region of the frame image corresponding to the divided region of the slice image The pixel corresponding to each pixel in the divided region of the slice image is obtained by using the matching processing and the back trace processing fixed at both end points based on the dynamic programming to obtain the correspondence with the pixel from the boundary to the end boundary. As a pixel in the corresponding region of the frame image, and based on the obtained pixel interval in the corresponding region of the frame image in the x-axis direction, a relative value for each pixel in the corresponding region is determined. A local distance calculation unit for calculating the distance as a local distance;
By adding the global distance for each corresponding region of the frame image calculated by the global distance calculation unit to the local distance for each pixel of the frame image calculated by the local distance calculation unit, A detailed distance calculation unit that calculates a detailed distance from the shooting target to each pixel of the frame image;
The image distance calculation apparatus according to claim 12, comprising:

Two frame images taken at different times by the camera, including part of the common image part, and calculating the distance from the subject to the camera for each pixel by the detailed distance calculation unit By extracting the RGB values of all the pixels of the two frame images, the RGB values not corresponding to the extracted RGB values are extracted from the two frame images thus obtained. A code detector for detecting the value of
A pixel distance value extracting unit that randomly selects 1 / N (N is a positive number) of the total number of pixels of the two frame images, and extracts a distance value of the selected pixels;
A code RGB value assigning unit that assigns the RGB values of the code so as not to overlap each distance value extracted by the pixel distance value extracting unit;
The RGB values of the pixels of the two frame images having the same distance value as the distance value to which the RGB value of the code is assigned by the code RGB value assigning unit are assigned according to the distance value. An RGB value replacement unit for replacing the RGB value of the code;
A pixel information recording unit that records the RGB values after being replaced by the RGB value replacement unit in association with the distance values of the pixels that have been replaced with the RGB values;
By applying a stitching algorithm to the two frame images in which the RGB values of the pixels have been replaced by the RGB value replacement unit, the two frame images are bonded to each other. A composite image generation unit for generating a composite image of
An RGB value that matches or approximates the RGB value recorded in the pixel information recording unit is detected from the RGB values of all the pixels of the combined image generated by the combined image generation unit. An RGB value detector;
For the pixel having the RGB value detected by the RGB value detection unit, the distance value associated with the RGB value recorded in the pixel information recording unit is added as distance information of the pixel. A distance information adding unit;
A modified paste in which the RGB value is corrected by changing the RGB value of the pixel to which the distance information is added by the distance information adding unit to an average value of the RGB values of pixels around the pixel. A modified composite image generation unit for generating a combined image;
A distance-added composite image generation unit that generates one composite image in which the distance information is added to all the pixels based on the N corrected composite images generated by the correction composite image generation unit; Have
When the pixel distance value extraction unit extracts the pixel distance value from the second time onward, the pixel distance value extraction unit extracts 1 / N of the total number of pixels from the pixels not selected in the past from the two frame images. Select a number of pixels at random and extract the distance value for that pixel,
The code RGB value assignment unit, the RGB value replacement unit, the pixel information recording unit, the composite image generation unit, the RGB value detection unit, the distance information addition unit, and the corrected composite image generation The unit generates N modified composite images by repeating each process N times in order based on the distance value selected by the pixel distance value extraction unit for the second time or later,
The distance-added composite image generation unit reads the distance information added to 1 / N pixels of the total number of pixels of the modified composite image by overlapping all of the N modified composite images. Thus, the distance information of all the pixels in the modified composite image is obtained, and the distance information is added to all the pixels by adding the obtained distance information to one composite image. The image distance calculation apparatus according to claim 14, wherein one composite image is generated.

Pixels from the boundary of the start end in the t-axis direction to the end boundary in the divided region of the slice image, and the boundary of the start end in the x-axis direction in the corresponding region of the frame image corresponding to the divided region of the slice image Corresponding to each pixel in the divided region of the slice image by determining the correspondence with the pixels from the boundary to the end boundary using matching processing and fixed back-end processing based on dynamic programming. X (1), x (2),..., X (G-1), x (G-1), x (G-1), x (G-1), x (G-1) ) (1 ≦ i ≦ G)
The average number of pixels from the boundary of the start end in the x-axis direction to the boundary of the end in the corresponding region of the frame image is xa,
The distance between the pixel x (i) in the corresponding region of the frame image and the pixel x (i−1) close to the pixel x (i) obtained by the backtrace process is expressed as x (i) − x (i-1)
The global distance of the corresponding area calculated by the global distance calculation unit is a distance zg,
The detailed distance from the imaging object to the camera at the pixel x (i) of the frame image is the distance z (i), and the distance z (i) is set using a positive constant β.
z (i) = zg + β (x (i) −x (i−1) −xa / G)
The image distance calculation device according to claim 12, further comprising: a detailed distance calculation unit that calculates the image distance according to

Two frame images taken at different times by the camera, including part of the common image part, and calculating the distance from the subject to the camera for each pixel by the detailed distance calculation unit By extracting the RGB values of all the pixels of the two frame images, the RGB values not corresponding to the extracted RGB values are extracted from the two frame images thus obtained. A code detector for detecting the value of
A pixel distance value extracting unit that randomly selects 1 / N (N is a positive number) of the total number of pixels of the two frame images, and extracts a distance value of the selected pixels;
A code RGB value assigning unit that assigns the RGB values of the code so as not to overlap each distance value extracted by the pixel distance value extracting unit;
The RGB values of the pixels of the two frame images having the same distance value as the distance value to which the RGB value of the code is assigned by the code RGB value assigning unit are assigned according to the distance value. An RGB value replacement unit for replacing the RGB value of the code;
A pixel information recording unit that records the RGB values after being replaced by the RGB value replacement unit in association with the distance values of the pixels that have been replaced with the RGB values;
By applying a stitching algorithm to the two frame images in which the RGB values of the pixels have been replaced by the RGB value replacement unit, the two frame images are bonded to each other. A composite image generation unit for generating a composite image of
An RGB value that matches or approximates the RGB value recorded in the pixel information recording unit is detected from the RGB values of all the pixels of the combined image generated by the combined image generation unit. An RGB value detector;
For the pixel having the RGB value detected by the RGB value detection unit, the distance value associated with the RGB value recorded in the pixel information recording unit is added as distance information of the pixel. A distance information adding unit;
A modified paste in which the RGB value is corrected by changing the RGB value of the pixel to which the distance information is added by the distance information adding unit to an average value of the RGB values of pixels around the pixel. A modified composite image generation unit for generating a combined image;
A distance-added composite image generation unit that generates one composite image in which the distance information is added to all the pixels based on the N corrected composite images generated by the correction composite image generation unit; Have
When the pixel distance value extraction unit extracts the pixel distance value from the second time onward, the pixel distance value extraction unit extracts 1 / N of the total number of pixels from the pixels not selected in the past from the two frame images. Select a number of pixels at random and extract the distance value for that pixel,
The code RGB value assignment unit, the RGB value replacement unit, the pixel information recording unit, the composite image generation unit, the RGB value detection unit, the distance information addition unit, and the corrected composite image generation The unit generates N modified composite images by repeating each process N times in order based on the distance value selected by the pixel distance value extraction unit for the second time or later,
The distance-added composite image generation unit reads the distance information added to 1 / N pixels of the total number of pixels of the modified composite image by overlapping all of the N modified composite images. Thus, the distance information of all the pixels in the modified composite image is obtained, and the distance information is added to all the pixels by adding the obtained distance information to one composite image. The image distance calculation apparatus according to claim 16, wherein one composite image is generated.