JP2025059651A

JP2025059651A - Learning database construction system, control device used in said system, and learning database construction method

Info

Publication number: JP2025059651A
Application number: JP2023169882A
Authority: JP
Inventors: 孝三森山; Kozo Moriyama; 晋亀山; Susumu Kameyama; ヤチュンヴ; Truong Vu Gia
Original assignee: Johnan Corp
Current assignee: Johnan Corp
Priority date: 2023-09-29
Filing date: 2023-09-29
Publication date: 2025-04-10
Also published as: WO2025070751A1

Abstract

To enable building a learning database with which the accuracy of recognizing a target object can be improved.SOLUTION: The present invention comprises: a 3D camera 24 capable of capturing an image of a target object 40; an imaging robot 20 capable of changing the position and attitude of the 3D camera 24; and a control device 10 capable of acquiring information of the image of the target object 40 captured by the 3D camera 24. The control device 10 generates three-dimensional information of the target object 40 by controlling the imaging robot 20 and carrying out three-dimensional imaging to capture the image of the target object 40 from a plurality of directions by the 3D camera 24, automatically generates annotation information for the image of the target object 40 captured by the 3D camera 24 on the basis of three-dimensional information, and builds a learning database in which a plurality of pieces of data from each of the annotation information and the three-dimension physical information of the target object 40 relative to the 3D camera 24 is adopted as learning data.SELECTED DRAWING: Figure 1

Description

本発明は学習データベース構築システム、および、該システムに利用される制御装置、並びに、学習データベース構築方法に係る。 The present invention relates to a learning database construction system, a control device used in the system, and a learning database construction method.

従来、画像認識技術におけるＡＩの機械学習に利用される学習データ（教師データとも呼ばれる）のデータベースを構築するに当たっては、対象とする物体（以下、対象物体という）を高い精度で認識しておくことが望まれる。また、この学習データベースの構築には多くの時間と労力とが必要となっていた。 Conventionally, when constructing a database of learning data (also called teacher data) used in AI machine learning in image recognition technology, it is desirable to recognize the target object (hereinafter referred to as the target object) with high accuracy. In addition, constructing this learning database requires a lot of time and effort.

対象物体を撮影して学習データベースを構築する技術を開示するものとして、特許文献１および特許文献２が知られている。 Patent Document 1 and Patent Document 2 are known as examples that disclose technology for photographing a target object and building a learning database.

特許文献１には、学習データセット（学習データベースに相当）の構築段階において、位置姿勢検出用マーカであるＡＲマーカ（２次元パターンマーカ）に対象物体の物体情報を関連付けることが開示されている。また、この特許文献１には、データベースに、ＡＲマーカと対象物体の物体名称等の物体情報とが関連付けられており、学習データセット作製装置のコンピュータがＡＲマーカ認識手段として動作することにより、ＡＲマーカを認識して、対象物体の物体情報を取得することが開示されている。 Patent Document 1 discloses that in the construction stage of a learning dataset (corresponding to a learning database), object information of a target object is associated with an AR marker (two-dimensional pattern marker), which is a marker for detecting position and orientation. Patent Document 1 also discloses that the AR marker is associated with object information such as the object name of the target object in a database, and that a computer of the learning dataset creation device operates as an AR marker recognition means to recognize the AR marker and obtain object information of the target object.

また、特許文献２には、ワークを把持するシミュレータ上のロボットハンドが把持動作を経て把持を成功させるときのロボットハンドの３次元座標データと、ワークをシミュレータ上の２次元撮像装置ＩＤによって所定画角から撮像した２次元撮像画像データとを備える学習用データセットをシミュレータから取得して複数組記憶するデータセット記憶部と、ワークを２次元撮像装置ＩＤによって所定画角と同じ画角から撮像した２次元撮像画像から、現実世界におけるロボットハンドの３次元座標を推論する学習モデルを構築することが開示されている。 Patent Document 2 also discloses a dataset storage unit that acquires from the simulator a learning dataset comprising three-dimensional coordinate data of the robot hand on the simulator when the robot hand grips a workpiece and successfully grips the workpiece through a gripping operation, and two-dimensional image data of the workpiece captured from a specified angle of view by a two-dimensional imaging device ID on the simulator, and stores multiple sets of the learning dataset, and constructs a learning model that infers the three-dimensional coordinates of the robot hand in the real world from two-dimensional images of the workpiece captured from the same angle of view as the specified angle of view by the two-dimensional imaging device ID.

特許第６４７４１７９号公報Patent No. 6474179 特開２０２０－８２３２２号公報JP 2020-82322 A

しかしながら、特許文献１のものは、２次元カメラを使用するものであることから、背景の色と対象物体の色とが似ている場合には、対象物体の認識精度が大幅に低下してしまう虞がある。また、ＡＲマーカを使用していることから、対象物体の姿勢が当該ＡＲマーカを認識できない状態となっている場合には、対象物体の情報を取得することができないものとなるため、物体情報を取得可能な対象物体の姿勢が限定されることとなり、精度の高い学習データベースを構築することが難しかった。 However, because the technique in Patent Document 1 uses a two-dimensional camera, there is a risk that the accuracy of recognizing the target object will be significantly reduced if the color of the background and the color of the target object are similar. In addition, because an AR marker is used, if the posture of the target object is such that the AR marker cannot be recognized, it is not possible to obtain information about the target object. This limits the posture of the target object from which object information can be obtained, making it difficult to build a highly accurate learning database.

また、特許文献２のものは、シミュレーションによる画像を利用するものであることから、現実の対象物体の特徴を十分に表現することはできておらず、対象物体の認識精度の向上には限界があることから、精度の高い学習データベースを構築することが困難であった。 In addition, because the technology in Patent Document 2 uses simulated images, it is unable to fully express the characteristics of real target objects, and there is a limit to how much the target object recognition accuracy can be improved, making it difficult to build a highly accurate learning database.

本発明は、かかる点に鑑みてなされたものであり、その目的とするところは、対象物体の認識精度（例えば対象物体の３次元姿勢の認識精度や対象物体における３次元上の特定位置の認識精度）の向上を図ることができる学習データベースを構築することが可能な学習データベース構築システム、および、該システムに利用される制御装置、並びに、学習データベース構築方法を提供することにある。 The present invention has been made in consideration of the above points, and its purpose is to provide a learning database construction system capable of constructing a learning database that can improve the recognition accuracy of a target object (for example, the recognition accuracy of the three-dimensional posture of a target object or the recognition accuracy of a specific three-dimensional position on the target object), as well as a control device and a learning database construction method used in the system.

前記の目的を達成するための本発明の解決手段は、対象物体を撮影可能な３次元カメラと、前記３次元カメラの位置および姿勢を変更可能な可動体と、前記３次元カメラによって撮影された前記対象物体の画像の情報を取得可能な制御装置とを備え、前記制御装置が、前記可動体を制御して前記３次元カメラによって前記対象物体を複数の方向から撮影する３次元撮影を実施することによって前記対象物体の３次元情報を生成する３次元情報生成部と、前記３次元カメラまたは当該３次元カメラとは別のカメラで前記対象物体を撮影した画像に対して前記３次元情報に基づいてアノテーション情報を自動生成するアノテーション情報生成部と、前記アノテーション情報、および、前記アノテーション情報の自動生成時に使用した前記カメラに対する前記対象物体の相対的な３次元物理情報それぞれの複数データを学習データとした学習データベースを構築する学習データベース構築部とを備えていることを特徴とする。 The solution of the present invention for achieving the above object comprises a three-dimensional camera capable of photographing a target object, a movable body capable of changing the position and attitude of the three-dimensional camera, and a control device capable of acquiring information on an image of the target object photographed by the three-dimensional camera, the control device being characterized in comprising a three-dimensional information generation unit that generates three-dimensional information of the target object by controlling the movable body to perform three-dimensional photographing of the target object from multiple directions using the three-dimensional camera, an annotation information generation unit that automatically generates annotation information based on the three-dimensional information for an image of the target object photographed by the three-dimensional camera or a camera other than the three-dimensional camera, and a learning database construction unit that constructs a learning database using multiple data of the annotation information and the relative three-dimensional physical information of the target object with respect to the camera used when automatically generating the annotation information as learning data.

この特定事項により、学習データベースは、アノテーション情報、および、アノテーション情報の自動生成時に使用したカメラに対する対象物体の相対的な３次元物理情報それぞれの複数データを学習データとして構築されているため、対象物体の認識精度（例えば対象物体の３次元姿勢の認識精度や対象物体における３次元上の特定位置の認識精度）の向上を図ることができる学習データベースを構築することが可能となる。例えば、対象物体に対して特定の処理を行うに際しては、当該対象物体の２次元画像を取得するのみで前記３次元物理情報を得ることができ、この３次元物理情報に基づいて対象物体の認識精度を高めることができて、当該対象物体に対する前記特定の処理を高い精度で実施することが可能となる。 Due to this specification, the learning database is constructed using multiple pieces of learning data, each of which is annotation information and three-dimensional physical information of the target object relative to the camera used when the annotation information was automatically generated, making it possible to construct a learning database that can improve the recognition accuracy of the target object (for example, the recognition accuracy of the three-dimensional posture of the target object and the recognition accuracy of a specific three-dimensional position on the target object). For example, when performing specific processing on the target object, the three-dimensional physical information can be obtained simply by acquiring a two-dimensional image of the target object, and the recognition accuracy of the target object can be improved based on this three-dimensional physical information, making it possible to perform the specific processing on the target object with high accuracy.

また、前記３次元物理情報は、前記アノテーション情報の自動生成時に使用した前記カメラに対する前記対象物体の相対的な３次元姿勢情報である。 The three-dimensional physical information is the relative three-dimensional posture information of the target object with respect to the camera used when automatically generating the annotation information.

これによれば、対象物体の２次元画像を取得するのみで対象物体の相対的な３次元姿勢を認識することができる。例えばロボットによって対象物体を把持する場合に、把持位置の最適化を図ることが可能となる。 This makes it possible to recognize the relative three-dimensional posture of a target object simply by acquiring a two-dimensional image of the target object. For example, when a target object is grasped by a robot, it becomes possible to optimize the grasping position.

また、前記３次元物理情報は、前記アノテーション情報の自動生成時に使用した前記カメラに対する前記対象物体における３次元上の特定位置の情報であってもよい。 The three-dimensional physical information may also be information about a specific three-dimensional position of the target object relative to the camera used when automatically generating the annotation information.

これによれば、対象物体の２次元画像を取得するのみで対象物体における３次元上の特定位置を認識することができる。例えばロボットによって対象物体の特定位置を加工する場合に、加工位置を高い精度で特定することが可能となる。 This makes it possible to recognize a specific three-dimensional position on a target object simply by acquiring a two-dimensional image of the target object. For example, when processing a specific position on a target object using a robot, it becomes possible to identify the processing position with high accuracy.

また、前記学習データベース構築システムに利用される制御装置も本発明の技術的思想の範疇である。つまり、可動体に支持されて該可動体の作動によって位置および姿勢を変更可能な３次元カメラによって撮影された対象物体の画像の情報を取得可能な制御装置であって、前記可動体を制御して前記３次元カメラによって前記対象物体を複数の方向から撮影する３次元撮影を実施することによって前記対象物体の３次元情報を生成する３次元情報生成部と、前記３次元カメラまたは当該３次元カメラとは別のカメラで前記対象物体を撮影した画像に対して前記３次元情報に基づいてアノテーション情報を自動生成するアノテーション情報生成部と、前記アノテーション情報、および、前記アノテーション情報の自動生成時に使用した前記カメラに対する前記対象物体の相対的な３次元物理情報それぞれの複数データを学習データとした学習データベースを構築する学習データベース構築部とを備えたものである。 The control device used in the learning database construction system also falls within the scope of the technical idea of the present invention. In other words, the control device is capable of acquiring information on an image of a target object captured by a three-dimensional camera supported on a movable body and capable of changing the position and attitude by operating the movable body, and includes a three-dimensional information generation unit that generates three-dimensional information on the target object by controlling the movable body to perform three-dimensional photography of the target object from multiple directions using the three-dimensional camera, an annotation information generation unit that automatically generates annotation information based on the three-dimensional information for an image of the target object captured by the three-dimensional camera or a camera other than the three-dimensional camera, and a learning database construction unit that constructs a learning database using multiple data on the annotation information and the relative three-dimensional physical information of the target object with respect to the camera used when automatically generating the annotation information as learning data.

また、前記学習データベース構築システムにおいて実施される学習データベース構築方法も本発明の技術的思想の範疇である。つまり、３次元カメラを支持する可動体を制御して前記３次元カメラによって対象物体を複数の方向から撮影する３次元撮影を実施することによって前記対象物体の３次元情報を生成する３次元情報生成工程と、前記３次元カメラまたは当該３次元カメラとは別のカメラで前記対象物体を撮影した画像に対して前記３次元情報に基づいてアノテーション情報を自動生成するアノテーション情報生成工程と、前記アノテーション情報、および、前記アノテーション情報の自動生成時に使用した前記カメラに対する前記対象物体の相対的な３次元物理情報それぞれの複数データを学習データとした学習データベースを構築する学習データベース構築工程とを含むものである。 The learning database construction method implemented in the learning database construction system also falls within the scope of the technical idea of the present invention. In other words, it includes a three-dimensional information generation step of generating three-dimensional information of a target object by controlling a movable body supporting a three-dimensional camera to perform three-dimensional photography of the target object from multiple directions using the three-dimensional camera, an annotation information generation step of automatically generating annotation information based on the three-dimensional information for an image of the target object captured by the three-dimensional camera or a camera other than the three-dimensional camera, and a learning database construction step of constructing a learning database using multiple data of the annotation information and the relative three-dimensional physical information of the target object with respect to the camera used when automatically generating the annotation information as learning data.

これらの特定事項にあっても、前述したように、対象物体の認識精度（例えば対象物体の３次元姿勢の認識精度や対象物体における３次元上の特定位置の認識精度）の向上を図ることができる学習データベースを構築することが可能となる。 Even with these specific features, as mentioned above, it is possible to construct a learning database that can improve the recognition accuracy of the target object (for example, the recognition accuracy of the target object's three-dimensional orientation or the recognition accuracy of a specific three-dimensional position on the target object).

本発明では、可動体を制御して３次元カメラによって対象物体を複数の方向から撮影する３次元撮影を実施することによって前記対象物体の３次元情報を生成する。また、前記３次元カメラまたは当該３次元カメラとは別のカメラで前記対象物体を撮影した画像に対して前記３次元情報に基づいてアノテーション情報を自動生成する。そして、前記アノテーション情報、および、前記アノテーション情報の自動生成時に使用した前記カメラに対する前記対象物体の相対的な３次元物理情報それぞれの複数データを学習データとした学習データベースを構築するようにしている。これにより、対象物体の認識精度（例えば対象物体の３次元姿勢の認識精度や対象物体における３次元上の特定位置の認識精度）の向上を図ることができる学習データベースを構築することが可能となる。 In the present invention, three-dimensional information of a target object is generated by controlling a movable body to perform three-dimensional photography in which the target object is photographed from multiple directions by a three-dimensional camera. Furthermore, annotation information is automatically generated based on the three-dimensional information for an image of the target object photographed by the three-dimensional camera or a camera other than the three-dimensional camera. A learning database is then constructed using multiple pieces of learning data, including the annotation information and the three-dimensional physical information of the target object relative to the camera used when automatically generating the annotation information. This makes it possible to construct a learning database that can improve the recognition accuracy of the target object (for example, the recognition accuracy of the three-dimensional posture of the target object or the recognition accuracy of a specific three-dimensional position on the target object).

第１の実施形態に係る学習データベース構築システムの全体構成を示す概略図である。1 is a schematic diagram showing the overall configuration of a learning database construction system according to a first embodiment; 第１の実施形態に係る制御装置の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a control device according to the first embodiment. 第１の実施形態に係る撮影ロボットの構成を示すブロック図である。1 is a block diagram showing a configuration of an imaging robot according to a first embodiment. FIG. 第１の実施形態に係る準備処理の手順を示すフローチャート図である。FIG. 4 is a flowchart illustrating a procedure of a preparation process according to the first embodiment. 第１の実施形態に係る対象物体の点群データの一例を示す図である。4 is a diagram showing an example of point cloud data of a target object according to the first embodiment; FIG. 第１の実施形態に係る深層学習処理の手順を示すフローチャート図である。FIG. 2 is a flowchart showing the procedure of deep learning processing according to the first embodiment. 第１の実施形態に係る対象物体の画像に対して付与されたバウンディングボックスの複数の例を示すイメージ図である。1 is an image diagram showing multiple examples of bounding boxes assigned to an image of a target object according to the first embodiment; 第１の実施形態に係る３Ｄカメラに対する対象物体の３次元姿勢を取得する動作を説明するための図である。4A to 4C are diagrams for explaining an operation of acquiring a three-dimensional posture of a target object relative to a 3D camera according to the first embodiment. 第７の実施形態に係る準備処理の手順を示すフローチャート図である。FIG. 23 is a flowchart showing the procedure of a preparation process according to the seventh embodiment. 第９の実施形態に係る対象物体の特定位置の推論動作を説明するための図である。23A to 23D are diagrams for explaining an inference operation of a specific position of a target object according to the ninth embodiment.

以下、本発明の実施形態を図面に基づいて説明する。以下に述べる各実施形態は、ロボット（可動体）によって対象物体に所定の処理（例えば対象物体の把持や対象物体における特定位置の加工等）を行うに当たって利用される学習データベース（教師データベース）を構築するものとして本発明を適用した場合について説明する。尚、以下の説明では、同一の部品や構成要素には同一の符号を付している。それらの名称および機能も同じであるため、それらについての詳細な説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. Each embodiment described below will be applied to the case where the present invention is constructed as a learning database (teacher database) that is used when a robot (movable body) performs a predetermined process on a target object (e.g., grasping the target object or processing a specific position on the target object). In the following description, identical parts and components are given the same reference numerals. As their names and functions are also the same, detailed descriptions of them will not be repeated.

＜第１の実施形態＞
本実施形態は、３Ｄカメラ（３次元カメラ）に対する対象物体の３次元姿勢を推定するに当たって利用される学習データベースを構築するものとして本発明を適用した場合について説明する。 First Embodiment
In this embodiment, a case will be described in which the present invention is applied to construct a learning database used for estimating the three-dimensional posture of a target object relative to a 3D camera (three-dimensional camera).

－学習データベース構築システムの全体構成および動作概要－
先ず、図１を参照して、本実施形態に係る学習データベース構築システム１の全体構成について説明する。学習データベース構築システム１は、主たる構成要素として、制御装置１０、撮影ロボット２０、および、載置装置３０等を含んでいる。 -Overall configuration and operation of the learning database construction system-
First, the overall configuration of a learning database construction system 1 according to this embodiment will be described with reference to Fig. 1. The learning database construction system 1 includes, as main components, a control device 10, an image capturing robot 20, and a mounting device 30.

制御装置１０は、サーバやコンピュータ等によって実現されるものであって、有線ＬＡＮまたは無線ＬＡＮを介して撮影ロボット２０や載置装置３０との間で各種データの送受信を行うことにより、撮影ロボット２０に搭載された３Ｄカメラ２４によって撮影された画像の情報を取得する等の各種の処理を実行する。この制御装置１０において行われる処理の詳細については後述する。 The control device 10 is realized by a server, computer, etc., and performs various processes such as acquiring information on images captured by the 3D camera 24 mounted on the shooting robot 20 by transmitting and receiving various data between the shooting robot 20 and the mounting device 30 via a wired LAN or wireless LAN. The processes performed by the control device 10 will be described in detail later.

撮影ロボット２０は、アーム部２７や該アーム部２７の先端に取り付けられた作業部２８等を備えた多関節ロボットで構成されており、制御装置１０から送信される指令情報に基づいて、あるいは自身の判断処理（後述するＣＰＵ２１での処理）に従って、アーム部２７や作業部２８等を様々な位置に移動させたり、様々な姿勢に傾けたりする等の各種の作業を実行する。 The photography robot 20 is composed of an articulated robot equipped with an arm unit 27 and a working unit 28 attached to the tip of the arm unit 27, and performs various tasks such as moving the arm unit 27 and working unit 28 to various positions and tilting them into various postures based on command information sent from the control device 10 or according to its own judgment processing (processing by the CPU 21 described later).

載置装置３０は、深層学習やアノテーションの対象となる対象物体４０が載置される載置台３１を有するものである。この載置台３１は、回転したり、傾けたりすることができるものである。 The mounting device 30 has a mounting table 31 on which a target object 40 that is the subject of deep learning or annotation is placed. This mounting table 31 can be rotated and tilted.

そして、制御装置１０は、撮影ロボット２０のアーム部２７を稼働させることで３Ｄカメラ２４の位置や姿勢を変化させ、これによって、載置台３１に載置された対象物体４０を様々な角度から撮影して、当該対象物体４０のアノテーションを自動的に実行したり、当該対象物体４０の撮影画像に自動的にバウンディングボックスを付与したり、または、当該撮影画像から当該対象物体４０のセグメンテーションを実行したり、更には、３Ｄカメラ２４に対する対象物体４０の３次元姿勢を学習データとして取得したりすることができるものである（詳しくは後述する）。 The control device 10 operates the arm unit 27 of the shooting robot 20 to change the position and orientation of the 3D camera 24, thereby photographing the target object 40 placed on the mounting table 31 from various angles and automatically annotating the target object 40, automatically adding a bounding box to the captured image of the target object 40, or segmenting the target object 40 from the captured image, and even acquiring the three-dimensional orientation of the target object 40 relative to the 3D camera 24 as learning data (details will be described later).

このように、本実施形態にかかる学習データベース構築システム１にあっては、作業者の手間を削減した深層学習を可能にするものである。以下では、学習データベース構築システム１の各部の構成および動作について詳細に説明する。 In this way, the learning database construction system 1 according to this embodiment enables deep learning with reduced effort on the part of the operator. The configuration and operation of each part of the learning database construction system 1 will be described in detail below.

－制御装置の構成－
本実施形態にかかる学習データベース構築システム１の構成要素である制御装置１０の構成の一態様について図２を用いて説明する。制御装置１０は、主たる構成要素として、ＣＰＵ（Central Processing Unit）１１、メモリ１２、操作部１３、および、通信インターフェイス１４等を含んで構成されている。 -Control device configuration-
An embodiment of the configuration of the control device 10, which is a component of the learning database construction system 1 according to the present embodiment, will be described with reference to Fig. 2. The control device 10 includes, as main components, a CPU (Central Processing Unit) 11, a memory 12, an operation unit 13, and a communication interface 14.

ＣＰＵ１１は、メモリ１２に記憶されているプログラムを実行することによって、制御装置１０の各部を制御する。例えば、ＣＰＵ１１は、メモリ１２に格納されているプログラムを実行し、各種のデータを参照することによって、後述する各種の処理を実行する。 The CPU 11 controls each part of the control device 10 by executing a program stored in the memory 12. For example, the CPU 11 executes a program stored in the memory 12 and performs various processes described below by referring to various data.

そして、このＣＰＵ１１は、前記プログラムによって実現される機能部として、３次元情報生成部１１ａ、アノテーション情報生成部１１ｂ、および、学習データベース構築部１１ｃを備えている。以下、これら各部の機能の概略について説明する。 The CPU 11 has three-dimensional information generating unit 11a, annotation information generating unit 11b, and learning database constructing unit 11c as functional units realized by the program. The following provides an overview of the functions of each of these units.

３次元情報生成部１１ａは、撮影ロボット２０（特に撮影ロボット２０のアーム部２７）を制御して３Ｄカメラ２４によって対象物体４０を複数の方向から撮影する３次元撮影を実施することによって対象物体４０の３次元情報を生成する機能を有している。具体的に、３次元情報生成部１１ａは、対象物体４０だけの３次元情報（３次元画像の情報）を取得する機能を有しており、撮影ロボット２０のアーム部２７を移動させたり回転させたりすることで、対象物体４０の全ての周囲３６０度分の３次元撮影を行って、取得した３６０度分のＲＧＢ＋ｄｅｐｔｈｍａｐから、対象物体４０の３次元点群データを作成する機能を有している。 The three-dimensional information generating unit 11a has a function of generating three-dimensional information of the target object 40 by controlling the photographing robot 20 (particularly the arm unit 27 of the photographing robot 20) to perform three-dimensional photographing of the target object 40 from multiple directions using the 3D camera 24. Specifically, the three-dimensional information generating unit 11a has a function of acquiring three-dimensional information (three-dimensional image information) of only the target object 40, and has a function of acquiring three-dimensional photographs of the entire 360-degree circumference of the target object 40 by moving and rotating the arm unit 27 of the photographing robot 20, and creating three-dimensional point cloud data of the target object 40 from the acquired RGB+depth map for 360 degrees.

アノテーション情報生成部１１ｂは、３Ｄカメラ２４で対象物体４０を撮影した画像に対して前記３次元情報に基づいてアノテーション情報を自動生成する機能を有している。具体的に、アノテーション情報生成部１１ｂは、対象物体４０の３次元点群データと、３Ｄカメラ２４によって対象物体４０を撮影した際の撮影方向（アーム部２７の姿勢に応じた３Ｄカメラ２４の姿勢により決定される撮影方向）の情報とから、対象物体４０に略外接するバウンディングボックス（またはセグメンテーションによる対象物体４０の輪郭線）を自動生成する機能を有している。 The annotation information generating unit 11b has a function of automatically generating annotation information based on the three-dimensional information for an image of the target object 40 captured by the 3D camera 24. Specifically, the annotation information generating unit 11b has a function of automatically generating a bounding box that approximately circumscribes the target object 40 (or a contour line of the target object 40 by segmentation) from the three-dimensional point cloud data of the target object 40 and information on the shooting direction when the target object 40 is captured by the 3D camera 24 (the shooting direction is determined by the attitude of the 3D camera 24 according to the attitude of the arm unit 27).

学習データベース構築部１１ｃは、前記アノテーション情報、および、３Ｄカメラ２４に対する対象物体４０の相対的な３次元姿勢の情報（本発明でいう３次元物理情報）それぞれの複数データを学習データとした学習データベースを構築する。具体的に、学習データベース構築部１１ｃは、対象物体４０を複数方向から撮影することで、各方向それぞれに対応して対象物体４０に付与されたバウンディングボックス（またはセグメンテーションによる対象物体４０の輪郭線）の情報および対象物体４０の３次元姿勢の情報を学習していき（学習データを取得していき）、これら学習データによって学習データベースを構築する機能を有している。 The learning database construction unit 11c constructs a learning database using multiple pieces of data, each of which is the annotation information and information on the three-dimensional posture of the target object 40 relative to the 3D camera 24 (three-dimensional physical information in this invention), as learning data. Specifically, the learning database construction unit 11c has the function of photographing the target object 40 from multiple directions, learning (acquiring learning data) information on the bounding box (or the contour of the target object 40 by segmentation) assigned to the target object 40 corresponding to each direction, and information on the three-dimensional posture of the target object 40, and constructing a learning database using this learning data.

メモリ１２は、各種のＲＡＭ、各種のＲＯＭ等によって実現され、制御装置１０に内包されているものであってもよいし、制御装置１０の各種インターフェイスに着脱可能なものであってもよいし、制御装置１０からアクセス可能な他の装置の記録媒体であってもよい。メモリ１２は、ＣＰＵ１１によって実行されるプログラムや、ＣＰＵ１１によるプログラムの実行により生成されたデータ、入力されたデータ、その他の本実施形態で利用されるデータベース等を記憶する。 The memory 12 is realized by various RAMs, various ROMs, etc., and may be included in the control device 10, may be detachable from various interfaces of the control device 10, or may be a recording medium of another device accessible from the control device 10. The memory 12 stores programs executed by the CPU 11, data generated by execution of programs by the CPU 11, input data, other databases used in this embodiment, etc.

操作部１３は、ユーザや管理者等の命令を受け付けて、当該命令をＣＰＵ１１に入力する。 The operation unit 13 accepts commands from a user, administrator, etc., and inputs the commands to the CPU 11.

通信インターフェイス１４は、ＣＰＵ１１からのデータを、有線ＬＡＮや無線ＬＡＮを介して撮影ロボット２０に送信したり、逆に、撮影ロボット２０からデータ（３Ｄカメラ２４からの３次元画像の情報を含む）を受信してＣＰＵ１１に受け渡したりする。 The communication interface 14 transmits data from the CPU 11 to the camera robot 20 via a wired or wireless LAN, and conversely, receives data from the camera robot 20 (including three-dimensional image information from the 3D camera 24) and passes it to the CPU 11.

－撮影ロボットの構成－
次に、学習データベース構築システム１の構成要素である撮影ロボット２０の構成の一態様について図３を用いて説明する。撮影ロボット２０は、主たる構成要素として、ＣＰＵ２１、メモリ２２、操作部２３、３Ｄカメラ２４、ライト２５、通信インターフェイス２６、アーム部２７、および、作業部２８等を含んで構成されている。 - Configuration of the filming robot -
Next, one embodiment of the configuration of the photography robot 20, which is a component of the learning database construction system 1, will be described with reference to Fig. 3. The photography robot 20 includes, as main components, a CPU 21, a memory 22, an operation unit 23, a 3D camera 24, a light 25, a communication interface 26, an arm unit 27, and a working unit 28.

ＣＰＵ２１は、メモリ２２に記憶されているプログラムを実行することによって、撮影ロボット２０の各部を制御する。例えば、撮影ロボット２０の各関節部に備えられたモータの回転角度を調整することにより、アーム部２７の姿勢を制御する。 The CPU 21 controls each part of the photography robot 20 by executing a program stored in the memory 22. For example, the CPU 21 controls the posture of the arm unit 27 by adjusting the rotation angle of the motor provided in each joint of the photography robot 20.

メモリ２２は、各種のＲＡＭや、各種のＲＯＭ等によって実現され、各種のプログラムや、ＣＰＵ２１によるプログラムの実行により生成されたデータ、制御装置１０から与えられた操作命令、操作部２３を介して入力されたデータ等を記憶する。 The memory 22 is realized by various RAMs, various ROMs, etc., and stores various programs, data generated by execution of programs by the CPU 21, operation commands given by the control device 10, data input via the operation unit 23, etc.

操作部２３は、ボタンやスイッチ等から構成され、ユーザからの各種の命令を受け付けて、当該命令をＣＰＵ２１に入力する。 The operation unit 23 is composed of buttons, switches, etc., and accepts various commands from the user and inputs the commands to the CPU 21.

３Ｄカメラ２４は、ＲＧＢ－Ｄカメラ等によって実現される。この３Ｄカメラ２４は、例えば２つのカメラを利用することによって撮影した画像の各部までの距離を取得することができる。３Ｄカメラ２４は、ＣＰＵ２１からの指示に基づいて、３次元撮影を行ったり、通常の２次元撮影を行ったりする。 The 3D camera 24 is realized by an RGB-D camera or the like. This 3D camera 24 can obtain the distance to each part of the captured image by using, for example, two cameras. The 3D camera 24 performs three-dimensional shooting or normal two-dimensional shooting based on instructions from the CPU 21.

ライト２５は、ＣＰＵ２１からの指示に従って、３Ｄカメラ２４の前方に光を照射するものである。 The light 25 emits light in front of the 3D camera 24 according to instructions from the CPU 21.

通信インターフェイス２６は、インターネットやキャリア網やルータ等を介して、制御装置１０等の他の装置との間でデータを送受信する。例えば、通信インターフェイス２６は、制御装置１０から操作命令を受信して、ＣＰＵ２１に受け渡す。 The communication interface 26 transmits and receives data to and from other devices such as the control device 10 via the Internet, a carrier network, a router, etc. For example, the communication interface 26 receives an operation command from the control device 10 and passes it to the CPU 21.

アーム部２７は、ＣＰＵ２１からの指示に従って、アーム部２７に取り付けられた３Ｄカメラ２４の位置や姿勢を制御したり、作業部２８の位置や姿勢を制御したりする。３Ｄカメラ２４はアーム部２７に固定されている。また、アーム部２７の姿勢は、撮影ロボット２０の各関節の回転角度位置をモニタすることによって特定できる。このため、アーム部２７の姿勢を認識することにより、３Ｄカメラ２４の位置や姿勢を把握できるようになっている。例えば、３Ｄカメラ２４のレンズの光軸に沿う方向（以下、Ｘ軸方向という：図８を参照）での座標位置（以下、Ｘ軸座標位置という）、当該Ｘ軸方向に直交し且つ水平方向に延在する方向（以下、Ｙ軸方向という）での座標位置（以下、Ｙ軸座標位置という）、前記Ｘ軸方向およびＹ軸方向それぞれに直交する方向に延在する方向（以下、Ｚ軸方向という）での座標位置（以下、Ｚ軸座標位置という）を、アーム部２７の姿勢を認識することにより把握することが可能である。 The arm unit 27 controls the position and attitude of the 3D camera 24 attached to the arm unit 27 and the position and attitude of the working unit 28 according to instructions from the CPU 21. The 3D camera 24 is fixed to the arm unit 27. The attitude of the arm unit 27 can be identified by monitoring the rotation angle position of each joint of the shooting robot 20. Therefore, by recognizing the attitude of the arm unit 27, the position and attitude of the 3D camera 24 can be grasped. For example, by recognizing the attitude of the arm unit 27, it is possible to grasp the coordinate position (hereinafter referred to as the X-axis coordinate position) in a direction along the optical axis of the lens of the 3D camera 24 (hereinafter referred to as the X-axis direction: see FIG. 8), the coordinate position (hereinafter referred to as the Y-axis coordinate position) in a direction perpendicular to the X-axis direction and extending horizontally (hereinafter referred to as the Y-axis direction), and the coordinate position (hereinafter referred to as the Z-axis coordinate position) in a direction extending perpendicular to each of the X-axis direction and the Y-axis direction.

作業部２８は、アーム部２７の先端に取り付けられた人の手に相当し、把持動作等を行うもので、ＣＰＵ２１からの指示に従って、対象物体４０を把持したり、対象物体４０の位置や向きを変更したりするための各種の動作を実行する。 The working unit 28 corresponds to a human hand attached to the tip of the arm unit 27 and performs grasping operations, etc., and performs various operations to grasp the target object 40 and change the position and orientation of the target object 40 according to instructions from the CPU 21.

－制御装置の情報処理－
次に、図４を参照して、制御装置１０の情報処理について詳述する。制御装置１０のＣＰＵ１１は、深層学習の準備処理として、メモリ１２のプログラムに従って、図４に示す処理を実行する。この図４に示す処理は、前記３次元情報生成部１１ａによる処理である。 -Control device information processing-
Next, the information processing of the control device 10 will be described in detail with reference to Fig. 4. As a preparation process for deep learning, the CPU 11 of the control device 10 executes the process shown in Fig. 4 according to the program in the memory 12. The process shown in Fig. 4 is a process performed by the three-dimensional information generator 11a.

先ず、予め、ＣＰＵ１１は、撮影環境（テーブル、ステージ、ロボット自身等）の３次元形状・位置情報等の３次元ＣＡＤデータを受け付けて、メモリ１２に登録（環境情報登録）しておく（ステップＳＴ１）。ここでいう撮影環境は、３Ｄカメラ２４によって撮影される画像中における対象物体４０以外の物体（将来的な画像処理において撮影画像から差し引くための物体）であって、例えば、載置装置３０や、背景となる壁面や、３Ｄカメラ２４の視野内に作業部２８が入り込む状況にあっては当該作業部２８等が挙げられる。 First, the CPU 11 accepts 3D CAD data such as 3D shape and position information of the shooting environment (table, stage, robot itself, etc.) and registers it in the memory 12 (environment information registration) (step ST1). The shooting environment here refers to objects other than the target object 40 in the image captured by the 3D camera 24 (objects to be subtracted from the captured image in future image processing), such as the mounting device 30, the background wall surface, and the working unit 28 when the working unit 28 is within the field of view of the 3D camera 24.

ＣＰＵ１１は、撮影ロボット２０のアーム部２７に取り付けた３Ｄカメラ２４に対象物体４０を撮影させて、ＲＧＢ＋ｄｅｐｔｈｍａｐを取得する（ステップＳＴ２）。尚、前述したように、３Ｄカメラ２４は、撮影ロボット２０のアーム部２７に固定されているため、ＣＰＵ１１は、撮影ロボット２０やアーム部２７の姿勢情報から、３Ｄカメラ２４の姿勢情報（前述したＸ軸座標位置、Ｙ軸座標位置、Ｚ軸座標位置等）を計算することができる。 The CPU 11 causes the 3D camera 24 attached to the arm 27 of the image capturing robot 20 to capture an image of the target object 40, and acquires an RGB+depth map (step ST2). As described above, since the 3D camera 24 is fixed to the arm 27 of the image capturing robot 20, the CPU 11 can calculate the orientation information of the 3D camera 24 (the aforementioned X-axis coordinate position, Y-axis coordinate position, Z-axis coordinate position, etc.) from the orientation information of the image capturing robot 20 and the arm 27.

ＣＰＵ１１は、ステップＳＴ２で撮影したＲＧＢ＋ｄｅｐｔｈｍａｐから、ステップＳＴ１で登録した周囲の物体のデータ（撮影環境のデータ）を差し引くことによって、対象物体４０だけの３次元情報（３次元画像の情報）を取得する（ステップＳＴ３の物体データ抽出）。 The CPU 11 obtains 3D information (information on the 3D image) of only the target object 40 by subtracting the data of the surrounding objects (data on the shooting environment) registered in step ST1 from the RGB+depth map captured in step ST2 (object data extraction in step ST3).

ＣＰＵ１１は、撮影ロボット２０のアーム部２７を移動させたり回転させたり、または、載置台３１を回転させたり傾けたりして（ステップＳＴ４の撮影方向変更）、別の角度からの撮影を行う（ステップＳＴ２の３Ｄ撮影）。すなわち、対象物体４０の全ての周囲３６０度分の３次元撮影が完了するまで（ステップＳＴ５でＹＥＳ判定されるまで）、ＣＰＵ１１は、ステップＳＴ２からステップＳＴ５の処理を繰り返す。 The CPU 11 moves or rotates the arm 27 of the imaging robot 20, or rotates or tilts the mounting base 31 (changing the imaging direction in step ST4), to capture an image from a different angle (3D imaging in step ST2). That is, the CPU 11 repeats the processes in steps ST2 to ST5 until 3D imaging of the entire 360-degree circumference of the target object 40 is completed (until a YES determination is made in step ST5).

ＣＰＵ１１は、対象物体４０の３６０度分のＲＧＢ＋ｄｅｐｔｈｍａｐから、対象物体４０の３次元点群データを作成する（ステップＳＴ６の３Ｄ点群データ作成）。具体的には、図５に示すように、対象物体４０の３次元撮影画像から、３次元の立体点群データが作成される。 The CPU 11 creates three-dimensional point cloud data of the target object 40 from the RGB+depth map of 360 degrees of the target object 40 (3D point cloud data creation in step ST6). Specifically, as shown in FIG. 5, three-dimensional solid point cloud data is created from the three-dimensional captured image of the target object 40.

尚、ＣＰＵ１１は、ステップＳＴ６で作成された点群に基づいて、点群が不足している箇所や、ノイズがある箇所に対して、その箇所がより詳細に撮影できるようにアーム部２７を動かして３Ｄカメラ２４で追加撮影を行い、３次元点群を再合成することが好ましい（ステップＳＴ７のデータ補完処理）。 Furthermore, it is preferable that the CPU 11 moves the arm unit 27 to take additional photographs with the 3D camera 24 so that areas where the point cloud is insufficient or where there is noise can be photographed in more detail based on the point cloud created in step ST6, and recomposes the three-dimensional point cloud (data complementation process in step ST7).

このようにして３次元の立体点群データが作成されるため、この作成された立体点群データが本発明でいう３次元情報に相当し、立体点群データを作成する工程が本発明でいう３次元情報生成工程に相当する。 Three-dimensional point cloud data is created in this manner, and the created point cloud data corresponds to the three-dimensional information referred to in the present invention, and the process of creating the point cloud data corresponds to the three-dimensional information generation process referred to in the present invention.

制御装置１０のＣＰＵ１１は、前述した図４で示した準備処理の後、引き続き、深層学習処理として、メモリ１２のプログラムに従って、図６に示す処理を実行する。この図６に示す処理は、前記アノテーション情報生成部１１ｂおよび学習データベース構築部１１ｃによる処理である。 After the preparatory process shown in FIG. 4, the CPU 11 of the control device 10 continues to execute the process shown in FIG. 6 as deep learning processing according to the program in the memory 12. The process shown in FIG. 6 is performed by the annotation information generating unit 11b and the learning database constructing unit 11c.

ＣＰＵ１１は、撮影ロボット２０のアーム部２７に取り付けた３Ｄカメラ２４に対象物体４０を２次元撮影させる（ステップＳＴ１１）。 The CPU 11 causes the 3D camera 24 attached to the arm portion 27 of the photographing robot 20 to take a two-dimensional photograph of the target object 40 (step ST11).

ＣＰＵ１１は、撮影ロボット２０やアーム部２７の位置情報および姿勢情報と、対象物体４０の位置情報および姿勢情報と、対象物体４０の３次元点群データとに基づいて、対象物体４０の見え方を計算したり、自動的にアノテーション情報を作成したりする（ステップＳＴ１２）。 The CPU 11 calculates how the target object 40 appears and automatically creates annotation information based on the position information and posture information of the shooting robot 20 and the arm unit 27, the position information and posture information of the target object 40, and the three-dimensional point cloud data of the target object 40 (step ST12).

具体的に、本実施形態においては、例えば図７（ａ）～（ｃ）に示すように、ＣＰＵ１１は、アノテーション情報として、対象物体４０の３次元点群データと撮影方向とから、対象物体４０に略外接するバウンディングボックスを自動生成する。図７（ａ）は対象物体４０を正面から撮影した場合に付与されたバウンディングボックスであり、図７（ｂ）は対象物体４０を斜め前方から撮影した場合に付与されたバウンディングボックスである。これら図７（ａ），（ｂ）は、水平線と垂直線とで囲まれるバウンディングボックスを付与した場合を表している。このバウンディングボックスは、水平線と垂直線との交点である４点の座標位置が規定されるものであり、本実施形態の場合、これら４点の座標位置が学習データとなる。 Specifically, in this embodiment, as shown in, for example, FIGS. 7(a) to (c), the CPU 11 automatically generates a bounding box that approximately circumscribes the target object 40 from the three-dimensional point cloud data of the target object 40 and the shooting direction as annotation information. FIG. 7(a) shows a bounding box that is added when the target object 40 is photographed from the front, and FIG. 7(b) shows a bounding box that is added when the target object 40 is photographed from diagonally in front. These FIGS. 7(a) and (b) show a case where a bounding box surrounded by horizontal and vertical lines is added. This bounding box specifies the coordinate positions of four points that are the intersections of the horizontal and vertical lines, and in this embodiment, the coordinate positions of these four points become learning data.

また、図７（ｃ）は対象物体４０を斜め後方から撮影した場合に付与された回転バウンディングボックスである。回転バウンディングボックスとは、対象物体４０に対する最小外接矩形を設定するに当たり、バウンディングボックスを構成する直線として水平線および垂直線であることの制限を外すことで得られるものであり、水平線および垂直線で規定されるバウンディングボックスよりも小さな領域であって対象物体４０に対する最小外接矩形を設定することができるものとなっている。この回転バウンディングボックスを付与した場合にあっては、当該回転バウンディングボックスを構成する各直線同士の交点である４点の座標位置が学習データとなる。 Figure 7(c) shows a rotated bounding box applied when the target object 40 is photographed from diagonally behind. A rotated bounding box is obtained by removing the restriction that the straight lines constituting the bounding box must be horizontal and vertical lines when setting a minimum circumscribing rectangle for the target object 40, and it is possible to set a minimum circumscribing rectangle for the target object 40 in an area smaller than a bounding box defined by horizontal and vertical lines. When this rotated bounding box is applied, the coordinate positions of the four points that are the intersections of the straight lines constituting the rotated bounding box become the learning data.

また、アノテーション情報として、バウンディングボックスに代えて、セグメンテーションを行い、対象物体４０の輪郭線（対象物体４０と背景とを分ける境界線）を自動生成したものとしてもよい。この場合、対象物体４０と背景とを分ける境界線が学習データとなる。 In addition, instead of a bounding box, the annotation information may be an automatically generated contour line of the target object 40 (a boundary line separating the target object 40 from the background) obtained by performing segmentation. In this case, the boundary line separating the target object 40 from the background becomes the learning data.

以上のステップＳＴ１１，ＳＴ１２の処理がアノテーション情報生成部１１ｂによって実施されるアノテーション情報生成工程に相当する。 The above steps ST11 and ST12 correspond to the annotation information generation process performed by the annotation information generation unit 11b.

ステップＳＴ１３では、ステップＳＴ１２で取得した情報に加えて、当該撮影時点での撮影ロボット２０のアーム部２７の姿勢（３Ｄカメラ２４の姿勢に対応）と、前述した準備処理において取得した（ステップＳＴ２で取得した）対象物体４０の３次元姿勢情報とから、当該撮影時点での３Ｄカメラ２４の姿勢において当該３Ｄカメラ２４から見た（３Ｄカメラ２４に対して相対的な）対象物体４０の３次元姿勢を取得する。この３次元姿勢も本発明でいう学習データとなる。一般的に、図８に示すように、３Ｄカメラ２４のレンズのＸ軸方向、Ｙ軸方向、Ｚ軸方向は、対象物体４０の設置位置におけるＸ軸方向（水平方向）、Ｙ軸方向（Ｘ軸方向に直交する水平方向）、Ｚ軸方向（鉛直方向）の各軸それぞれにおけるズレ量から、３Ｄカメラ２４に対する対象物体４０の３次元姿勢を取得することができる。 In step ST13, in addition to the information acquired in step ST12, the three-dimensional posture of the target object 40 as seen from the 3D camera 24 (relative to the 3D camera 24) in the posture of the 3D camera 24 at the time of shooting is acquired from the posture of the arm unit 27 of the shooting robot 20 at the time of shooting (corresponding to the posture of the 3D camera 24) and the three-dimensional posture information of the target object 40 acquired in the preparation process described above (acquired in step ST2). This three-dimensional posture is also the learning data referred to in the present invention. In general, as shown in FIG. 8, the three-dimensional posture of the target object 40 relative to the 3D camera 24 can be acquired from the amount of deviation in each of the X-axis direction (horizontal direction), Y-axis direction (horizontal direction perpendicular to the X-axis direction), and Z-axis direction (vertical direction) of the lens of the 3D camera 24 at the installation position of the target object 40.

その後、ステップＳＴ１４において、ＣＰＵ１１は、撮影ロボット２０のアーム部２７を移動させたり回転させたりすることにより（ステップＳＴ１４の撮影方向変更）、別の角度からの撮影を行う（ステップＳＴ１１の撮影）。すなわち、対象物体４０の全ての周囲３６０度分の２次元撮影が完了するまで（ステップＳＴ１５でＹＥＳ判定されるまで）、ＣＰＵ１１は、ステップＳＴ１１からステップＳＴ１５の処理を繰り返す。これにより、複数方向それぞれに対応して対象物体４０に付与されたバウンディングボックス（またはセグメンテーションによる対象物体４０の輪郭線）の情報および対象物体４０の３次元姿勢の情報（３Ｄカメラ２４に対して相対的な対象物体４０の３次元姿勢を）が学習されていく（学習データとして取得されていく）こととなり、これら学習データによって学習データベースが構築されることになる。 Then, in step ST14, the CPU 11 moves and rotates the arm 27 of the shooting robot 20 (changing the shooting direction in step ST14) to shoot from a different angle (shooting in step ST11). That is, the CPU 11 repeats the processes from step ST11 to step ST15 until two-dimensional shooting of the entire 360 degrees around the target object 40 is completed (until a YES judgment is made in step ST15). As a result, information on the bounding box (or the contour of the target object 40 obtained by segmentation) assigned to the target object 40 corresponding to each of the multiple directions and information on the three-dimensional posture of the target object 40 (the three-dimensional posture of the target object 40 relative to the 3D camera 24) are learned (acquired as learning data), and a learning database is constructed from these learning data.

このような学習データベースが構築されることにより、３Ｄカメラ２４によって一つの２次元画像を取得するのみで、対象物体４０の３次元姿勢を推定（推論モデルを生成）することが可能となる。例えば、撮影ロボット２０の作業部２８によって対象物体４０を把持するに当たっては、３Ｄカメラ２４によって一つの２次元画像を取得するのみで、学習データベースを参照しながら（例えば、取得した２次元画像に合致する学習データの抽出により）対象物体４０の３次元姿勢を推定して、最適な把持位置に向けて作業部２８を移動させることが可能となる。 By constructing such a learning database, it becomes possible to estimate the three-dimensional posture of the target object 40 (generate an inference model) simply by acquiring one two-dimensional image with the 3D camera 24. For example, when the target object 40 is grasped by the working unit 28 of the photographing robot 20, it becomes possible to estimate the three-dimensional posture of the target object 40 by simply acquiring one two-dimensional image with the 3D camera 24 while referring to the learning database (for example, by extracting learning data that matches the acquired two-dimensional image), and to move the working unit 28 to the optimal grasping position.

以上のステップＳＴ１３～ＳＴ１５の処理が学習データベース構築部１１ｃによって実施される学習データベース構築工程に相当する。 The above steps ST13 to ST15 correspond to the learning database construction process carried out by the learning database construction unit 11c.

－実施形態の効果－
以上説明したように本実施形態では、構築された学習データベースが、アノテーション情報、および、３Ｄカメラ２４に対する対象物体４０の相対的な３次元姿勢情報それぞれの複数データを学習データとして構築されている。このため、対象物体４０の認識精度（対象物体４０の３次元姿勢の認識精度）の向上を図ることができる学習データベースを構築することが可能となる。例えば、撮影ロボット２０の作業部２８によって対象物体４０を把持するに際しては、当該対象物体４０の２次元画像を取得するのみで対象物体４０の相対的な３次元姿勢情報を得ることができ、この３次元姿勢情報に基づいて対象物体４０の認識精度を高めることができて、当該対象物体４０を把持するに当たって、最適な位置（例えば対象物体４０の重心位置付近）を把持することが可能となる。 --Effects of the embodiment--
As described above, in this embodiment, the constructed learning database is constructed with a plurality of data of annotation information and relative three-dimensional posture information of the target object 40 with respect to the 3D camera 24 as learning data. Therefore, it is possible to construct a learning database that can improve the recognition accuracy of the target object 40 (recognition accuracy of the three-dimensional posture of the target object 40). For example, when the target object 40 is grasped by the working unit 28 of the photographing robot 20, the relative three-dimensional posture information of the target object 40 can be obtained simply by acquiring a two-dimensional image of the target object 40, and the recognition accuracy of the target object 40 can be improved based on this three-dimensional posture information, and when grasping the target object 40, it becomes possible to grasp the target object 40 at an optimal position (for example, near the center of gravity of the target object 40).

以下、本発明における他の実施形態について説明する。 Other embodiments of the present invention are described below.

＜第２の実施形態＞
前述した実施形態に加えて、ＣＰＵ１１は、以前に自動的に作成したアノテーション情報に基づいた対象物体４０の深層学習を利用した認識結果を用いることによって、それ以降、対象物体４０を含む撮影画像からステップＳＴ１１で計算された対象物体４０のアノテーション情報と、深層学習で認識した情報との類似度を計算し、その類似度が大きい場合には、同様の角度に近い角度をより重点的にアノテーション処理を行うことが好ましい。 Second Embodiment
In addition to the above-described embodiment, it is preferable that the CPU 11 uses the recognition result using deep learning of the target object 40 based on the annotation information that was previously automatically created, and thereafter calculates the similarity between the annotation information of the target object 40 calculated in step ST11 from the captured image including the target object 40 and the information recognized by deep learning, and if the similarity is large, performs annotation processing with greater emphasis on angles that are close to a similar angle.

＜第３の実施形態＞
前述した実施形態に加えて、撮影ロボット２０や載置装置３０や天井や壁面等にライトを準備してもよい。そして、ステップＳＴ１４では、ＣＰＵ１１は、撮影ロボット２０のアーム部２７を移動させたり回転させたり、ライトをＯＮ／ＯＦＦしたり、ライトの光度を変更したり、ライトの光の色を変更したりして、別の角度からの撮影を行う。すなわち、対象物体４０の周囲３６０度分の様々な光の状態の２次元撮影が完了するまで（ステップＳＴ１５でＹＥＳ判定されるまで）、ＣＰＵ１１は、ステップＳＴ１１～ＳＴ１５の処理を繰り返す。 Third Embodiment
In addition to the above-described embodiment, lights may be provided on the photographing robot 20, the mounting device 30, the ceiling, the wall, etc. Then, in step ST14, the CPU 11 moves or rotates the arm unit 27 of the photographing robot 20, turns the light on and off, changes the light intensity, or changes the color of the light to photograph from a different angle. That is, the CPU 11 repeats the processes of steps ST11 to ST15 until two-dimensional photographing of various light conditions for 360 degrees around the target object 40 is completed (until YES is determined in step ST15).

＜第４の実施形態＞
前述した実施形態に加えて、撮影ロボット２０に搭載された作業部２８により、対象物体４０の向きや姿勢を変更させてもよい。この場合、対象物体４０の３次元形状が変化するため、変更した対象物体４０の向き・姿勢に関する情報とそのときの対象物体４０の３次元形状を紐付けてメモリ１２に対象物体４０の姿勢毎に別々に保存する。 Fourth Embodiment
In addition to the above-described embodiment, the orientation and posture of the target object 40 may be changed by the working unit 28 mounted on the imaging robot 20. In this case, since the three-dimensional shape of the target object 40 changes, information on the changed orientation and posture of the target object 40 is linked to the three-dimensional shape of the target object 40 at that time, and is stored separately in the memory 12 for each posture of the target object 40.

ＣＰＵ１１は、深層学習処理（図６のフローチャートに示した処理）を実施する際に、メモリ１２に保存された対象物体４０の向きや姿勢を読み出し、対象物体４０がそのとおりの姿勢になるように撮影ロボット２０の作業部２８により対象物体４０の向き・姿勢を登録された状態にした後、深層学習処理を実施する。 When performing the deep learning process (the process shown in the flowchart of FIG. 6), the CPU 11 reads out the orientation and posture of the target object 40 stored in the memory 12, and causes the working unit 28 of the imaging robot 20 to set the orientation and posture of the target object 40 to the registered state so that the target object 40 has the correct posture, and then performs the deep learning process.

＜第５の実施形態＞
前述した実施形態に加えて、ステップＳＴ１１において、ＣＰＵ１１は、撮影ロボット２０に２次元撮影を実行させたが、撮影ロボット２０に３次元撮影を行わせてもよい。そして、各々の３次元撮影データに対して、３Ｄ点群データに基づいてアノテーション情報を付与するものであってもよい。 Fifth embodiment
In addition to the above-described embodiment, in step ST11, the CPU 11 causes the image capturing robot 20 to perform two-dimensional image capturing, but the image capturing robot 20 may also perform three-dimensional image capturing. Then, annotation information may be added to each piece of three-dimensional image capturing data based on the 3D point cloud data.

＜第６の実施形態＞
前述した実施形態においては、図４に示した準備処理で用いる３Ｄカメラ２４を用いて、図６に示した深層学習処理のための撮影も行うものであった。これに限らず、図６に示す深層学習処理のための撮影には、図４の示す準備処理で用いる３Ｄカメラ２４とは別のカメラを利用してもよい。 Sixth Embodiment
In the embodiment described above, the 3D camera 24 used in the preparation process shown in Fig. 4 is also used to capture images for the deep learning process shown in Fig. 6. However, the present invention is not limited to this, and a camera other than the 3D camera 24 used in the preparation process shown in Fig. 4 may be used to capture images for the deep learning process shown in Fig. 6.

＜第７の実施形態＞
前述した実施形態においては、撮影環境（テーブル、ステージ、ロボット自身等）の３次元形状・位置情報等の３次元ＣＡＤデータを受け付けて、メモリ１２に登録しておく形態を示したが、撮影環境の３次元情報も対象物体４０と同様の方法で取得してもよい。 Seventh embodiment
In the above-described embodiment, three-dimensional CAD data such as three-dimensional shape and position information of the shooting environment (table, stage, robot itself, etc.) is accepted and registered in memory 12, but the three-dimensional information of the shooting environment may also be acquired in a similar manner to that of the target object 40.

図９を参照して、本実施形態においては、対象物体４０を置く前に図４で示した対象物体取得方法と同様の方法で、撮影環境の３次元情報を取得し、環境情報が登録されていない場合（ステップＳＴ８でＮＯ判定された場合）、得られた３次元情報を環境データとしてメモリ１２に登録しておく（ステップＳＴ９）。 Referring to FIG. 9, in this embodiment, before placing the target object 40, three-dimensional information of the shooting environment is acquired in a manner similar to the target object acquisition method shown in FIG. 4, and if the environmental information has not been registered (if a NO judgment is made in step ST8), the acquired three-dimensional information is registered in memory 12 as environmental data (step ST9).

それ以降は第１の実施形態と同様の方法で、対象物体４０の全ての周囲３６０度分の３次元撮影が完了するまで（ステップＳＴ５でＹＥＳ判定されるまで）、ＣＰＵ１１は、ステップＳＴ２～ステップＳＴ５までの処理を繰り返す。 After that, in the same manner as in the first embodiment, the CPU 11 repeats the processes from step ST2 to step ST5 until 3D imaging of the entire 360-degree circumference of the target object 40 is completed (until a YES determination is made in step ST5).

＜第８の実施形態＞
前述した実施形態の学習データベース構築システム１の制御装置１０や撮影ロボット２０等の各装置の役割の一部または全部を他の装置が実行してもよい。例えば、制御装置１０の役割の一部を、撮影ロボット２０や、複数のパーソナルコンピューターや、クラウド上の複数のサーバで実行したりしてもよい。 Eighth embodiment
Some or all of the roles of the devices such as the control device 10 and the photographing robot 20 of the learning database construction system 1 of the above-described embodiment may be performed by other devices. For example, some of the roles of the control device 10 may be performed by the photographing robot 20, multiple personal computers, or multiple servers on the cloud.

＜第９の実施形態＞
前述した第１の実施形態のものは、３Ｄカメラ２４に対する対象物体４０の３次元姿勢を推定するに当たって利用される学習データベースを構築するものとしていた。本実施形態は、それに代えて、対象物体４０における特定位置の把持や特定位置に対する加工等の処理を行うに当たって利用される学習データベースを構築するものとして本発明を適用したものである。 Ninth embodiment
The first embodiment described above is intended to construct a learning database used to estimate the three-dimensional orientation of the target object 40 relative to the 3D camera 24. Instead, in the present embodiment, the present invention is applied to construct a learning database used to grasp a specific position on the target object 40, process a specific position, or the like.

図１０は、本実施形態に係る対象物体４０の特定位置の推論動作を説明するための図である。図１０における対象物体４０の位置Ｐ１は、作業部２８によって把持すべき位置である。例えば対象物体４０の重心位置付近の位置である。本実施形態にあっては、前述した学習データベース構築部１１ｃによって取得される学習データを対象物体４０上において作業部２８によって把持すべき位置（位置Ｐ１）として学習させる。つまり、図６を用いて説明した深層学習処理のステップＳＴ１３において、撮影時点での撮影ロボット２０のアーム部２７の姿勢（３Ｄカメラ２４の姿勢に対応）と、前述した準備処理において取得した対象物体４０の３次元姿勢情報とから、当該撮影時点での３Ｄカメラ２４の姿勢において当該３Ｄカメラ２４から見た（３Ｄカメラ２４に対して相対的な）対象物体４０の特定位置（例えば位置Ｐ１）を取得することになる。このようにして、対象物体４０を複数方向から撮影することで、各方向それぞれに対応して対象物体４０に付与されたバウンディングボックス（またはセグメンテーションによる対象物体４０の輪郭線）の情報および対象物体４０の特定位置の情報を学習していき（学習データを取得していき）、これら学習データによって学習データベースを構築することになる。これにより、一つの２次元画像を取得するのみで、対象物体４０上において作業部２８によって把持すべき位置を推定することができる。 Figure 10 is a diagram for explaining the inference operation of a specific position of the target object 40 according to this embodiment. The position P1 of the target object 40 in Figure 10 is the position to be grasped by the working unit 28. For example, it is a position near the center of gravity of the target object 40. In this embodiment, the learning data acquired by the learning database construction unit 11c described above is learned as the position (position P1) to be grasped by the working unit 28 on the target object 40. That is, in step ST13 of the deep learning process described using Figure 6, a specific position (e.g. position P1) of the target object 40 as seen from the 3D camera 24 (relative to the 3D camera 24) in the posture of the 3D camera 24 at the time of shooting is acquired from the posture of the arm unit 27 of the shooting robot 20 at the time of shooting (corresponding to the posture of the 3D camera 24) and the three-dimensional posture information of the target object 40 acquired in the preparation process described above. In this way, by photographing the target object 40 from multiple directions, information on the bounding box (or the contour of the target object 40 obtained by segmentation) assigned to the target object 40 corresponding to each direction and information on specific positions of the target object 40 are learned (learning data is acquired), and a learning database is constructed using this learning data. This makes it possible to estimate the position on the target object 40 that should be grasped by the working unit 28 by simply acquiring one two-dimensional image.

また、この場合、撮影ロボット２０のＴＣＰ（Tool Center Point）についても学習させておくようにしてもよい。つまり、撮影ロボット２０のアーム部２７の稼働制御としては、このＴＣＰ（例えば図１０における位置Ｐ２）に基づいて行うと共に、対象物体４０の把持に際しては、把持すべき位置（位置Ｐ１）に基づいて作業部２８の制御を行うようにするものである。 In this case, the TCP (Tool Center Point) of the imaging robot 20 may also be learned. In other words, the operation of the arm unit 27 of the imaging robot 20 is controlled based on this TCP (e.g., position P2 in FIG. 10), and when grasping the target object 40, the working unit 28 is controlled based on the position to be grasped (position P1).

また、本実施形態の構成は、対象物体４０を把持する場合に限らず、対象物体４０における特定位置の加工等を行うものとして利用することもできる。例えば、図１０における対象物体４０の位置Ｐ３を加工（例えば溶接等の加工）する場合に、学習データベース構築部１１ｃによって取得される学習データを対象物体４０上における加工位置として学習させることにより、一つの２次元画像を取得するのみで、対象物体４０上における加工位置を推定することができる。 The configuration of this embodiment is not limited to gripping the target object 40, and can also be used to process a specific position on the target object 40. For example, when processing (e.g., welding or other processing) position P3 of the target object 40 in FIG. 10, the learning data acquired by the learning database construction unit 11c is learned as the processing position on the target object 40, and the processing position on the target object 40 can be estimated by simply acquiring one two-dimensional image.

＜他の実施形態＞
尚、本発明は、前記各実施形態に限定されるものではなく、特許請求の範囲および該範囲と均等の範囲で包含される全ての変形や応用が可能である。 <Other embodiments>
The present invention is not limited to the above-described embodiments, and all modifications and applications within the scope of the claims and equivalents thereto are possible.

例えば、前記各実施形態では、撮影ロボット２０によって対象物体４０の把持や対象物体４０における特定位置の加工を行うに当たって利用される学習データベースを構築するものとして本発明を適用した場合について説明した。本発明はこれに限らず、その他の用途に利用される学習データベースを構築するものとしてもよい。例えば、コンベア上を搬送される対象物体４０が所定位置に達した時点での姿勢を認識し、所定姿勢にない対象物体４０を作業者に報知したり、撮影ロボット２０によって姿勢を修正したりするシステムに利用される学習データベースを構築するものとしてもよい。 For example, in each of the above embodiments, the present invention has been described as being applied to construct a learning database used when the photographing robot 20 grasps the target object 40 or processes a specific position on the target object 40. The present invention is not limited to this, and may be used to construct a learning database used for other purposes. For example, a learning database may be constructed to be used in a system that recognizes the posture of a target object 40 transported on a conveyor when it reaches a predetermined position, notifies an operator of a target object 40 that is not in the predetermined posture, or corrects the posture using the photographing robot 20.

また、前記各実施形態では、学習データベース構築システム１の構成要素として載置装置３０を含んだものとしていたが、この載置装置３０は、本発明において必須とする構成要素ではない。例えば、床面上に置いた対象物体４０を撮影していくことによって学習データベースを構築するようにしてもよい。 In addition, in each of the above embodiments, the placement device 30 is included as a component of the learning database construction system 1, but this placement device 30 is not a required component of the present invention. For example, the learning database may be constructed by photographing a target object 40 placed on the floor.

本発明は、対象物体の認識精度の向上を図るための学習データベース構築システムに適用可能である。 The present invention can be applied to a learning database construction system for improving the recognition accuracy of target objects.

１学習データベース構築システム
１０制御装置
１１ａ３次元情報生成部
１１ｂアノテーション情報生成部
１１ｃ学習データベース構築部
２０撮影ロボット（可動体）
２４３Ｄカメラ（３次元カメラ）
４０対象物体 1 Learning database construction system 10 Control device 11a Three-dimensional information generation unit 11b Annotation information generation unit 11c Learning database construction unit 20 Photographing robot (movable body)
24 3D camera (three-dimensional camera)
40 Target object

前記の目的を達成するための本発明の解決手段は、対象物体を撮影可能な３次元カメラと、前記３次元カメラの位置および姿勢を変更可能な可動体と、前記３次元カメラによって撮影された前記対象物体の画像の情報を取得可能な制御装置とを備え、前記制御装置が、前記可動体を制御して前記３次元カメラによって前記対象物体を複数の方向から撮影する３次元撮影を実施することによって前記対象物体の３次元情報を生成する３次元情報生成部と、前記３次元カメラまたは当該３次元カメラとは別のカメラで前記対象物体を撮影した画像に対して前記３次元情報と前記対象物体を撮影した際の撮影方向の情報とから、前記対象物体に外接するバウンディングボックスをアノテーション情報として自動生成するアノテーション情報生成部と、前記アノテーション情報、および、前記アノテーション情報の自動生成時に使用した前記カメラに対する前記対象物体の相対的な３次元物理情報それぞれの複数データを学習データとした学習データベースを構築する学習データベース構築部とを備えており、前記対象物体に外接する前記バウンディングボックスは、当該バウンディングボックスを構成する直線として水平線および垂直線であることの制限を外すことで得られると共に、前記対象物体に対して水平線および垂直線で規定したバウンディングボックスに比べて領域が小さく且つ当該対象物体に対する最小外接矩形を設定する回転バウンディングボックスを含む、ことを特徴とする。 The solution of the present invention for achieving the above object includes a three-dimensional camera capable of photographing a target object, a movable body capable of changing the position and attitude of the three-dimensional camera, and a control device capable of acquiring information on an image of the target object photographed by the three-dimensional camera, the control device comprising a three-dimensional information generation unit that generates three-dimensional information of the target object by controlling the movable body to perform three-dimensional photographing in which the target object is photographed from a plurality of directions by the three-dimensional camera, and a three-dimensional information generation unit that generates a bounding box circumscribing the target object as annotation information from the three-dimensional information and information on the photographing direction when the target object was photographed for an image of the target object photographed by the three-dimensional camera or a camera other than the three-dimensional camera . and a learning database construction unit that constructs a learning database using multiple data of the annotation information and relative three-dimensional physical information of the target object with respect to the camera used when automatically generating the annotation information as learning data , wherein the bounding box circumscribing the target object is obtained by removing the restriction that the straight lines that constitute the bounding box are horizontal and vertical lines, and includes a rotated bounding box that has a smaller area than a bounding box defined by horizontal and vertical lines for the target object and sets a minimum circumscribing rectangle for the target object .

また、前記学習データベース構築システムに利用される制御装置も本発明の技術的思想の範疇である。つまり、可動体に支持されて該可動体の作動によって位置および姿勢を変更可能な３次元カメラによって撮影された対象物体の画像の情報を取得可能な制御装置であって、前記可動体を制御して前記３次元カメラによって前記対象物体を複数の方向から撮影する３次元撮影を実施することによって前記対象物体の３次元情報を生成する３次元情報生成部と、前記３次元カメラまたは当該３次元カメラとは別のカメラで前記対象物体を撮影した画像に対して前記３次元情報と前記対象物体を撮影した際の撮影方向の情報とから、前記対象物体に外接するバウンディングボックスをアノテーション情報として自動生成するアノテーション情報生成部と、前記アノテーション情報、および、前記アノテーション情報の自動生成時に使用した前記カメラに対する前記対象物体の相対的な３次元物理情報それぞれの複数データを学習データとした学習データベースを構築する学習データベース構築部とを備えており、前記対象物体に外接する前記バウンディングボックスは、当該バウンディングボックスを構成する直線として水平線および垂直線であることの制限を外すことで得られると共に、前記対象物体に対して水平線および垂直線で規定したバウンディングボックスに比べて領域が小さく且つ当該対象物体に対する最小外接矩形を設定する回転バウンディングボックスを含むものである。 A control device used in the learning database construction system is also within the scope of the technical idea of the present invention. That is, a control device capable of acquiring information on an image of a target object captured by a three-dimensional camera supported on a movable body and capable of changing the position and orientation by operating the movable body, the control device comprising: a three-dimensional information generating unit that generates three-dimensional information of the target object by controlling the movable body to perform three-dimensional photography in which the target object is captured from a plurality of directions by the three-dimensional camera; and an annotation information generating unit that automatically generates, as annotation information, a bounding box circumscribing the target object from the three-dimensional information and information on the shooting direction when the target object was captured for an image of the target object captured by the three-dimensional camera or a camera other than the three-dimensional camera. and a learning database construction unit that constructs a learning database using as learning data multiple data of the annotation information and relative three-dimensional physical information of the target object with respect to the camera used when automatically generating the annotation information , wherein the bounding box circumscribing the target object is obtained by removing the restriction that the straight lines that constitute the bounding box are horizontal and vertical lines, and includes a rotated bounding box that has a smaller area than a bounding box defined by horizontal and vertical lines for the target object and sets a minimum circumscribing rectangle for the target object .

また、前記学習データベース構築システムにおいて実施される学習データベース構築方法も本発明の技術的思想の範疇である。つまり、３次元カメラを支持する可動体を制御して前記３次元カメラによって対象物体を複数の方向から撮影する３次元撮影を実施することによって前記対象物体の３次元情報を生成する３次元情報生成工程と、前記３次元カメラまたは当該３次元カメラとは別のカメラで前記対象物体を撮影した画像に対して前記３次元情報と前記対象物体を撮影した際の撮影方向の情報とから、前記対象物体に外接するバウンディングボックスをアノテーション情報として自動生成するアノテーション情報生成工程と、前記アノテーション情報、および、前記アノテーション情報の自動生成時に使用した前記カメラに対する前記対象物体の相対的な３次元物理情報それぞれの複数データを学習データとした学習データベースを構築する学習データベース構築工程とを含み、前記対象物体に外接する前記バウンディングボックスは、当該バウンディングボックスを構成する直線として水平線および垂直線であることの制限を外すことで得られると共に、前記対象物体に対して水平線および垂直線で規定したバウンディングボックスに比べて領域が小さく且つ当該対象物体に対する最小外接矩形を設定する回転バウンディングボックスを含んでいるものである。 Furthermore, the learning database construction method implemented in the learning database construction system is also within the scope of the technical idea of the present invention. In other words, the method includes a three-dimensional information generating step of generating three-dimensional information of the target object by controlling a movable body that supports a three-dimensional camera to perform three-dimensional photography in which the target object is photographed from a plurality of directions by the three-dimensional camera; an annotation information generating step of automatically generating a bounding box circumscribing the target object as annotation information from the three-dimensional information and information on the photographing direction when the target object was photographed for an image of the target object photographed by the three-dimensional camera or a camera other than the three-dimensional camera; and a learning database constructing step of constructing a learning database using as learning data a plurality of data of the annotation information and three-dimensional physical information of the target object relative to the camera used when automatically generating the annotation information, wherein the bounding box circumscribing the target object is obtained by removing the restriction that the straight lines constituting the bounding box are horizontal and vertical lines, and includes a rotated bounding box that has a smaller area than a bounding box defined by horizontal and vertical lines for the target object and sets a minimum circumscribing rectangle for the target object .

Claims

A three-dimensional camera capable of photographing a target object;
a movable body capable of changing the position and attitude of the three-dimensional camera;
a control device capable of acquiring information on an image of the target object captured by the three-dimensional camera,
The control device includes:
a three-dimensional information generating unit that generates three-dimensional information of the target object by controlling the movable body to perform three-dimensional photography in which the target object is photographed from a plurality of directions by the three-dimensional camera;
an annotation information generating unit that automatically generates annotation information based on three-dimensional information for an image of the target object captured by the three-dimensional camera or a camera other than the three-dimensional camera;
and a learning database construction unit that constructs a learning database using multiple pieces of data, each of which is the annotation information and the relative three-dimensional physical information of the target object with respect to the camera used when automatically generating the annotation information, as learning data.

2. The learning database construction system according to claim 1,
A learning database construction system, characterized in that the three-dimensional physical information is relative three-dimensional posture information of the target object with respect to the camera used when automatically generating the annotation information.

2. The learning database construction system according to claim 1,
A learning database construction system, characterized in that the three-dimensional physical information is information on a specific three-dimensional position of the target object relative to the camera used when automatically generating the annotation information.

A control device capable of acquiring information on an image of a target object captured by a three-dimensional camera supported on a movable body and capable of changing a position and attitude by operating the movable body, comprising:
a three-dimensional information generating unit that generates three-dimensional information of the target object by controlling the movable body to perform three-dimensional photography in which the target object is photographed from a plurality of directions by the three-dimensional camera;
an annotation information generating unit that automatically generates annotation information based on three-dimensional information for an image of the target object captured by the three-dimensional camera or a camera other than the three-dimensional camera;
and a learning database construction unit that constructs a learning database using multiple pieces of data, each of which is the annotation information and the relative three-dimensional physical information of the target object with respect to the camera used when automatically generating the annotation information, as learning data.

a three-dimensional information generating step of generating three-dimensional information of a target object by controlling a movable body supporting a three-dimensional camera to perform three-dimensional photography in which the target object is photographed from a plurality of directions by the three-dimensional camera;
an annotation information generating step of automatically generating annotation information based on three-dimensional information for an image of the target object captured by the three-dimensional camera or a camera other than the three-dimensional camera;
and a learning database construction step of constructing a learning database using as learning data multiple data of the annotation information and relative three-dimensional physical information of the target object with respect to the camera used when automatically generating the annotation information.